virtual presence

99
1/99 VIRTUAL PRESENCE VIRTUAL PRESENCE Authors: Voislav Galić, [email protected] Dušan Zečević, [email protected] Đorđe Đurđević, [email protected] Veljko Milutinović, [email protected] http://galeb.etf.bg.ac.yu/~vm/ tutorial

Upload: eytan

Post on 05-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

VIRTUAL PRESENCE. Authors:. Voislav Galić, [email protected]. Dušan Zečević, [email protected]. Đorđe Đurđević, [email protected]. Veljko Milutinović, [email protected]. http://galeb.etf.bg.ac.yu/~vm/tutorial. DEFINITION. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: VIRTUAL  PRESENCE

1/99

VIRTUAL PRESENCEVIRTUAL PRESENCE

Authors: Voislav Galić, [email protected]šan Zečević, [email protected]Đorđe Đurđević, [email protected] Milutinović, [email protected]

http://galeb.etf.bg.ac.yu/~vm/tutorial

Page 2: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 2/99

DEFINITIONDEFINITION

Virtual presence is a term with various shades of meanings in different industries,

but its essence remains constant; it is a new tool that enables some form of telecommunication in which the individual may substitute their physical presence

with an alternate, typically, electronic presence

Page 3: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 3/99

SUMMARYSUMMARY

- Introduction to Virtual Presence

- Data Mining for Virtual Presence

- A New Software Paradigm

- Selected Case Studies

Page 4: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 4/99

INTRODUCTION TO VPINTRODUCTION TO VP

- Definitions

- VP applications

- Psychological aspects

Page 5: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 5/99

DATA MINING FOR VPDATA MINING FOR VP

- Why Data Mining?

- What can Data Mining do?

- Growing popularity of Data Mining

- Algorithms

Page 6: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 6/99

SOFTWARE AGENTSSOFTWARE AGENTS

- A new software paradigm

- Standardization

- FIPA specifications

- Agent management

- Agent Communication Language

Page 7: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 7/99

GoodNews (CMU*)GoodNews (CMU*)

* Carnegie Mellon University, Pittsburgh, USA

- Categorization of financial news articles

- Co-located phrases

- Domain Experts

- Implementation and results

Page 8: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 8/99

iMatch (MIT*)iMatch (MIT*)

- The idea- associate MIT students and staff

in order to ease their cooperation;- help students find resources they need

- Implementation- advanced, agent-based system architecture

- Tomorrow?

* Massachusetts Institute of Technology, USA

Page 9: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 9/99

““Tourist city” (ETF*)Tourist city” (ETF*)

• A qualitative step forward in the domain of maximization of customer satisfaction

• Technologies:• Data Mining• Software Agents (mobile)

* Faculty of Electrical Engineering, University of Belgrade, Serbia and Montenegro

Page 10: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 10/99

CONCLUSIONCONCLUSION

This tutorial will attempt to familiarize you with:

- The concept of VP (Virtual Presence) as a new technological challenge

- The new paradigms and technologies that will bring the VP to everyday life:

- Data Mining- Software Agents

Page 11: VIRTUAL  PRESENCE

INTRODUCTIONINTRODUCTION

Virtual presence will arguably be one of the most important aspects of personal

communication in the twenty-first century

Page 12: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 12/99

Essence of VPEssence of VP

• The usefulness and reliability of virtual presence• The ability to conduct everyday tasks by being virtually

or electronically present

Page 13: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 13/99

How to Accomplish it?How to Accomplish it?

• The presence is accomplished through the Internet, video, or other communications, perhaps even psychically one day

• Technological advance will sophisticate virtual presence, altering the very meaning of the word “presence”

Page 14: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 14/99

VP ApplicationsVP Applications

• VP in government

– “Sunshine laws”

– Voting

Page 15: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 15/99

VP ApplicationsVP Applications

• VP in business

– Online board meetings

– Shareholder voting online

Page 16: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 16/99

VP ApplicationsVP Applications

• VP in education

– interactive lectures and courses

Page 17: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 17/99

VP ApplicationsVP Applications

• VP in medicine

– Telemedicine• Diagnostics• Remote surgery

– Risks• Privacy

Page 18: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 18/99

VP ApplicationsVP Applications

• VP in everyday life

– Telecommuting/Telework

– Software agents as our virtual “shadows”

Page 19: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 19/99

Psychological AspectsPsychological Aspects

• Cyberspace and Mind

• Presence in Virtual Space

• Communal Mind and Virtual Community

Page 20: VIRTUAL  PRESENCE

DATA MININGDATA MINING

Knowledge discovery is a non-trivial process of identifying valid, novel, potentially useful, and ultimately

understandable patterns in data

Page 21: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 21/99

Many DefinitionsMany Definitions

• Data mining is also called data or knowledge discovery• It is a process of inferring knowledge

from large oceans of data• Search for valuable information in large volumes of data• Analyzing data from different perspectives

and summarizing it into useful information

Page 22: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 22/99

Why Data Mining ?Why Data Mining ?

• DM allows you to extract knowledge from historical data and predict outcomes of future situations

• Optimize business decisions and improve customers’ satisfaction with your services

• Analyze data from many different angles, categorize it, and summarize the relationships identified

• Reveal knowledge hidden in data and turn this knowledge into a crucial competitive advantage

Page 23: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 23/99

What Can Data Mining Do?What Can Data Mining Do?

• Identify your best prospects and then retain them as customers

• Predict cross-sell opportunities and make recommendations• Learn parameters influencing trends in sales and margins• Segment markets and personalize communications

etc.

Page 24: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 24/99

The Power of Data MiningThe Power of Data Mining

• Having a database is one thing, making sense of it is quite another

• It does not rely on narrow human queries to produce results, but instead uses AI related technology and algorithms

• Inductive reasoning• Using more than one type of algorithm

to search for patterns in data• Data mining produces usually more general (=more powerful)

results than those obtained by traditional techniques• Relational DB storage and management technology is OK

for data mining applications less than 50 gigabytes

Page 25: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 25/99

Reasons for the Growing Reasons for the Growing Popularity of Data MiningPopularity of Data Mining

• Growing Data Volume• Low Cost of Machine Learning• Limitations of Human Analysis

Page 26: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 26/99

Tasks Solved by Data MiningTasks Solved by Data Mining

• Predicting• Classification• Detection of relations• Explicit modeling• Clustering• Market basket analysis• Deviation detection

Page 27: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 27/99

AlgorithmsAlgorithms

• Generally, their complexity is around n (log n)(n is the number of records)

• Data mining includes three major components, with corresponding algorithms:– Clustering (Classification)– Association Rules– Sequential Analysis

Page 28: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 28/99

Classification AlgorithmsClassification Algorithms

• The aim is to develop a description or model for each class in a database, based on the features present in a set of class-labeled “training data”

• Data Classification Methods:– Statistical algorithms– Neural networks– Genetic algorithms– Nearest neighbor method– Rule induction– Data visualization

Page 29: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 29/99

Classification-rule LearningClassification-rule Learning

• Data abstraction• Classification-rule learning – finding rules or decision trees

that partition given data into predefined classes– Hunt’s method

• Decision tree building algorithms:– ID3 / C4.5 algorithm– SLIQ / SPRINT algorithm (IBM)

• Other algorithms

Page 30: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 30/99

Parallel AlgorithmsParallel Algorithms

• Basic Idea: N training data items are randomly distributed to P processors. All the processors cooperate to expand the root node of the decision tree

• There are two approaches for future progress (the remaining nodes):– Synchronous approach– Partitioned approach

Page 31: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 31/99

Association Rule AlgorithmsAssociation Rule Algorithms

• Association rule implies certain association relationship among the set of objects in a database

• These objects “occur together”, or “one implies the other”• Formally: X Y, where X and Y are sets of items (itemsets)• Key terms

– Confidence– Support

• The goal – to find all association rules that satisfy user-specified minimum support and minimum confidence constraints.

Page 32: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 32/99

Association Rule AlgorithmsAssociation Rule Algorithms

• Apriori algorithm and its variations– AprioriTid– AprioriHybrid– FT (Fault-tolerant) Apriori

• Distributed / Parallel algorithms (FDM, …)

Page 33: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 33/99

Sequential AnalysisSequential Analysis

• Sequential Patterns• The problem – finding all sequential patterns

with user-specified minimum support• Elements of a sequential pattern need not to be:

– consecutive– simple items

• Algorithms for finding sequential patterns– “count-all” algorithms– “count-some” algorithms

Page 34: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 34/99

Conclusion Conclusion

• Drawbacks of existing algorithms– Data size– Data noise

• There are two critical technological drivers: – Size of the database– Query complexity

• The infrastructure has to be significantly enhanced to support larger applications

• Solutions– Adding extensive indexing capabilities– Using new HW architectures

to achieve improvements in query time

Page 35: VIRTUAL  PRESENCE

THE NEW SOFTWARE THE NEW SOFTWARE PARADIGMPARADIGM

All software agents are programs, but not all programs are agents

Page 36: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 36/99

Many DefinitionsMany Definitions

• Computational systems that inhabit some dynamic environment, sense and act autonomously and realize a set of goals or tasks for which they are designed

• Hardware or (more usually) software-based computer system that enjoys the following properties:- Reactive (sensing and acting)- Autonomous- Goal-oriented (pro-active purposeful)- Temporally continuous- Communicative (socially able)

- Learning (adaptive)- Mobile- Flexible- Character

Page 37: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 37/99

Interesting Topic of StudyInteresting Topic of Study

• They draw on and integrate many diverse disciplines of computer science and other areas:

– objects and distributed object architectures– adaptive learning systems– artificial intelligence and expert systems– collaborative online social environments– security– knowledge based systems, databases– communications networks– cognitive science and psychology

Page 38: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 38/99

What Problems do Agents What Problems do Agents Solve ?Solve ?

• Client/server network bandwidth problem• In the design of a client/server architecture• The problems created by intermittent

or unreliable network connections• Attempts to get computers to do real thinking for us

Page 39: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 39/99

The New Software ParadigmThe New Software Paradigm

• Unless special care has been taken in the design of the code, two software programs cannot interoperate

• The promise of agent technology is to move the burden of interoperability from software programmers to programs themselvesThis can happen if two conditions are met: – A common language (Agent Communication Language – ACL)– An appropriate architecture

Page 40: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 40/99

The Need for StandardsThe Need for Standards

• Anywhere, anytime consumer access to the Universal bouquet of information and services is the new goal of the information revolution

• The scope of Internet standards makes the scope of choices extreme

• The Foundation for Intelligent Physical Agents (FIPA), established in 1996 in Geneva• international non-profit association of companies and

organizations • specifications of generic agent technologies.

Page 41: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 41/99

FIPA SpecificationsFIPA Specifications

• Agent Management • Agent Communication Language • Agent/Software Integration• Agent Management Support for Mobility• Human-Agent Interaction • Agent Security Management• Agent Naming • FIPA Architecture • Agent Message Transport

etc.

Page 42: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 42/99

Agent ManagementAgent Management

• Provides the normative framework within which FIPA agents exist and operate

• Establishes the logical reference model for the creation, registration, location, communication, migration and retirement of agents

- The entities contained in the reference model are logical capability sets and do not imply any physical configuration

- Additionally, the implementation details of individual APs and agents are the design choices of the individual agent system developers

Page 43: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 43/99

Components of the ModelComponents of the Model

•Agent

•Directory Facilitator

•Agent Management System

•Message Transport Service

•Agent Platform

•Software

- computational process- fundamental actor on an AP- as a physical software process has a life cycle that has to be managed by the AP- yellow pages to other agents- supported function are:

-register-deregister-modify-search

- white pages services to other agents- maintains a directory of AIDs which contain transport addresses- supported function are:

-register-deregister-modify-search-get-description-operations for underlying AP

- communication method between agents

- physical infrastructure in which agents can be deployed

- all non-agent, executable collections of instructions accessible through an agent

Page 44: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 44/99

Agent Life CycleAgent Life Cycle

• FIPA agents exist physically on an AP and utilize the facilities offered by the AP for realising their functionalities

• In this context, an agent, as a physical software process, has a physical life cycle that has to be managed by the AP

The state transitions of agents can be described as:

- create- invoke- destroy- quit- suspend

- resume- wait- wake up- move*- execute*

Page 45: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 45/99

Agent Communication Agent Communication LanguageLanguage

• The specification consists of a set of message types and the description of their meanings

• Requirements:– Implementing a subset of the pre-defined message types

and protocols– Sending and receiving the not-understood message– Correct implementation of communicative acts

defined in the specification– Freedom to use communicative acts with other names,

not defined in the specification– Obligation of correctly generating messages in the transport form– Language must be able to express propositions,

objects and actions– The use of Agent Management Content Language and ontology

Page 46: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 46/99

ACL Syntax ElementsACL Syntax Elements

• Pre-defined message parameters:

:sender

:receiver

:content

:reply-with

:in-reply-to

:envelope

:language

:ontology

:reply-by

:protocol

:conversation-id

• Communicative acts:

accept-proposalagreecancelcfpconfirmdisconfirmfailureinforminform-ifinform-ref

not-understoodproposequery-ifquery-refrefusereject-proposalrequestrequest-whenrequest-wheneversubscribe

Page 47: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 47/99

Communication ExamplesCommunication Examples

- Agent i asks agent j for its available services:(query-ref     :sender i     :receiver j    :content       (iota ?x (available-services j ?x))    …)

- Agent j replies that it can reserve trains, planes and automobiles:(inform     :sender j     :receiver i    :content       (= (iota ?x (available-services j ?x))          ((reserve-ticket train)           (reserve-ticket plane)           (reserve automobile))       )    …)

- Agent j refuses to i reserve a ticket for i, since i there are insufficient funds in i's account:(refuse     :sender j     :receiver i    :content      (       (action j (reserve-ticket LHR, MUC, 27-sept-97))       (insufficient-funds ac12345)      )    :language sl)

- Agent i did not understand an query-if message because it did not recognize the ontology:(not-understood    :sender i    :receiver j    :content ((query-if :sender j :receiver i …)              (unknown (ontology www)))    :language sl)

- Agent i confirms to agent j that it is, in fact, true that it is snowing today:(confirm     :sender i     :receiver j    :content "weather( today, snowing )"    :language Prolog)

- Agent i, believing that agent j thinks that a shark is a mammal, attempts to change j's belief:(disconfirm     :sender i     :receiver j    :content (mammal shark))

- Agent i asks agent j if j is registered with domain server d1:(query-if     :sender i     :receiver j    :content       (registered (server d1) (agent j))    :reply-with r09)...(inform    :sender j    :receiver i    :content (not (registered (server d1) (agent j)))    :in-reply-to r09)

- Auction bid(inform    :sender agent_X     :receiver auction_server_Y    :content       (price (bid good02) 150) :in-reply-to round-4 :reply-with bid04 :language sl :ontology auction)

Page 48: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 48/99

Agent/Software IntegrationAgent/Software Integration

• Integration of services provided by non-agent software into a multi-agent community

• Definition of the relationship between agents and software systems

• Allowing agents to describe, broker and negotiate over software systems

• Allowing new software services to be dynamically introduced into an agent community

• Defining how software resources can be described, shared and dynamically controlled in an agent community

Page 49: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 49/99

New Agent Roles New Agent Roles

• To support specification, two new agent roles have been identified:

– Agent Resource Broker (ARB) – WRAPPER Agent

Page 50: VIRTUAL  PRESENCE

GoodNewsGoodNews

A system that automatically categorizesnews reports that reflect positively or negatively

on a company’s financial outlook

Page 51: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 51/99

IntroductionIntroduction

• Correlation between news reports on a company’s financial outlook and its attractiveness as an investment

• Volume of such reports is huge• A new text classification algorithm – “Domain Experts”

with “self-confident” sampling technique• Two types of data

– (Human-)labeled– Unlabeled

• The algorithm classifies financial news into the predefined five categories – (good) (good, uncertain) (neutral)

(bad, uncertain) (bad)

Page 52: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 52/99

IntroductionIntroduction

• Text categorization task• FCP (Frequently Co-located Phrase) the building element

for the categorization algorithm• Text categorization – very difficult domain

for the use of machine learning– Very large number of input features– High level of attribute and class noise– Large percent of irrelevant features

• Very expensive labeled data, while unlabeled data are cheaply available

Page 53: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 53/99

CategorizationCategorization

• The algorithm categorizes each given news article into the predefined categories in terms of referred company’s financial well-being

• GOOD – strong and explicit evidences of the company’s financial status– …shares of ABC company rose 2 percent to $24-15/16…

• GOOD, UNCERTAIN – predictions and forecasts of future profitability– … ABC company predicts fourth-quarter earnings will be high…

Page 54: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 54/99

CategorizationCategorization

• NEUTRAL – nothing is mentioned about the financial well-being of the company– … ABC announced plans to focus on products based on recycled

materials…

• BAD, UNCERTAIN – predictions of future loses– … ABC announced today that fourth-quarter results could

fall short of expectations…

• BAD – explicitly bad evidences– … shares of ABC fell $0.57 to $44.65 in early NY trading…

• Problems with construction of the training (i.e. labeled)data set – “inter-indexer inconsistency”

Page 55: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 55/99

Co-located PhraseCo-located Phrase

• The proposed algorithm labels the “unlabeled” news articlesthrough voting process among experts that are FCP’s

• Definition – a co-located phrase is a sequence of nearby, but not necessarily consecutive words– … shares of ABC rose 8.5%… (shares, rose): GOOD– …ABC presented its new product… (present, product): NEUTRAL

• Contextual information• The use of heuristics to cope with enormous “phrase space”

(amount of possible phrases)

Page 56: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 56/99

Naive-Bayes v Domain ExpertsNaive-Bayes v Domain Experts

• Naive-Bayes with EM (Expectation Maximization)• Problems with small sets of labeled (training) data;• EM (Expectation Maximization) – a class of iterative algorithms

for maximum likelihood estimation in problems with incomplete data

• Domain Experts algorithm is able to deal with inconsistent hypotheses

• Iterative building of the training set

Page 57: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 57/99

Implementation and ResultsImplementation and Results

• The experiment focused on two performance criteria:– Using unlabeled data for improving categorization accuracy– The categorization itself

• The accuracy is around 75% (total of 2000 news articles);• Comparison of a few different methods (picture)

Page 58: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 58/99

ConclusionsConclusions

• Domain Experts with SC sampling outperform naive Bayes with EM – collocation property and vote entropy

are appropriate to such a domain

• The accuracy of around 75% is the limit with the techniques used

• Better performance could be achieved by using some natural language processing techniques

• Such techniques are pretty rudimental today

Page 59: VIRTUAL  PRESENCE

iMatchiMatch

The vision of each MIT studenthaving a personal software agent,

which helps to manage its owner's academic life

Page 60: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 60/99

IntroductionIntroduction

• The aim: bring together MIT students and staff who may usefully collaborate with each other

• This collaboration can have several goals:– completing final projects– studying for exams– tutoring one another

• iMATCH agents are supposed to facilitate students and faculty matching for:– Research– Teaching– Internship

opportunities within and across campuses

Page 61: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 61/99

iMatch Agent ArchitectureiMatch Agent Architecture

• iMatch agents are situated within an environment

• Sensors of the agent convert environmental inputsinto representations that can be manipulated within the agent

• Effectors translate actions planned by the agentinto executable statements for the environment

• The action planner selects the action with the highest utilityaccording to the owner’s preference specification

Page 62: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 62/99

Impacts and BenefitsImpacts and Benefits

• MIT– Benefit MIT students by matching them to appropriate resources– Aid the recruitment of student researchers– Help students manage their lives– Use iMATCH in Medical Computing

• GLOBAL– Facilitate Cross Community Collaboration

Page 63: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 63/99

Research TopicsResearch Topics

• Knowledge representation– preference specification

• Multi-agents systems– reputation management system– static interest matching– dynamic interest matching

• Infrastructure– distributed security infrastructure

Page 64: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 64/99

Ceteris Paribus PreferenceCeteris Paribus Preference

• Ceteris paribus relations express a preference over sets of possible outcomes

• All possible outcomes are considered to be describable by some (large) set of binary features (true or false)– The specified features are instantiated to either true or false– Other features are ignored

I prefer ice cream

I prefer chocolate

I prefer train

I prefer airplane

I prefer cell phone

I prefer e-mail

Page 65: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 65/99

CPP Agent ConfigurationCPP Agent Configuration

• Specify a domain for preference– Agent methods of communication and notification– Different security settings of different servers

• Preference statements themselves– How to get users to easily adjust C.P. rules (graphical interface)– Pose hypothetical preference questions to user to help complete

the preferences of an ambivalent user

• People will only put down their true profile, if they know that the system is secure

Page 66: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 66/99

Static Interest MatchingStatic Interest Matching

• Group together similar users for specific context• This enables viewing a human user as a resourcefor dynamic

resource discovery (locate experts, enthusiasts,...)

• The approach: – Keyword matching– Ontological matching using Kulbeck-Leiber (KL) distance

Page 67: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 67/99

Dynamic Interest MatchingDynamic Interest Matching

• Location and/or temporal specific resource matching• As students and their agents move from one physical location

to another, iMatch services for matching the closest resources can be offered

• The idea: anything worthwhile is locatable• The approach:

– Intentional naming scheme– Reputation based resource discovery

Page 68: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 68/99

TechnologyTechnology

• Components– Distributed Multi-Agent Infrastructures– Ceteris Paribus preference-based Interest Matching– Reputation Management Infrastructure

• Technology– Microsoft.Net– Bluetooth– IEEE 802.11– Smartcards (PC/SC)– INS (International Naming System)

Page 69: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 69/99

ConclusionConclusion

• Benefit MIT students by matching them to appropriate resources

• Static interest matching– Group together similar users for specific context– This enables viewing a human user as a resource

for dynamic resource discovery (locate experts, enthusiasts,...)

• Dinamic interest matching– Location and/or temporal specific resource matching

As students and their agents move from one physical location to another, iMatch services for matching the closest resources can be offered

• Help students manage their lives

Page 70: VIRTUAL  PRESENCE

The near future…The near future…

The focus of the research is on e-tourism after the year 2005, but the applications

of the proposed infrastructure are multifold

Page 71: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 71/99

IntroductionIntroduction

• The assumptions:– after the year 2005, each tourist in Europe will be equiped with a

cell phone of the power same or better than the Pentium IV– whenever a tourism-based service or product is purchased, a

mobile agent is assigned to that cell phone PC, to monitor the behaviour of the customer

– all tourist cell phone PCs create an AD-HOC networkaround the points of touristic attractions, and link to a data mine that collects all information of interest

Page 72: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 72/99

How to accomplish it?How to accomplish it?

• The information of interest is not collected by asking the customer to fill out the forms, but by monitoring the behaviour of the customer

• The collected information, sorted in the data mine, is made available to other tourists, as an on-line owner-independent source of information about the given services and/or products

Page 73: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 73/99

What can be done… What can be done…

• If a tourist would like to know, at that very moment, what restaurant has good food/atmosphere and happy customers, he/she can access the data mine (via the Internet) and obtain the information that is linked to that very moment, and is not created by the owner of the business, but by the customers themselves

• Accessing the given restaurant’s website has two drawbacks:– the information is not fresh - periodically updated– the information is made by the owner of the restaurant,

and therefore not completely objective

Page 74: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 74/99

ConclusionConclusion

• Consequently, the proposed approach works much better , and represents a qualitative step forward in the domain of maximization of customer satisfaction

• This may mean that the privacy of the person is jeopardized,however, if the monitored behaviour is non-personalized, and if the customer obtains a discount based on the fact that mobile agents are welcome, the privacy stops to be an issue, and people will sign up voluntarily

Page 75: VIRTUAL  PRESENCE

AppendixAppendix

A Survey of the Data Mining Algorithms

Page 76: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 76/99

Apriori AlgorithmApriori Algorithm

• The task – mining association rules by finding large itemsets and translating them to the corresponding association rules;

• A B, or A1 A2 … Am B1 B2 … Bn, where A B =

• The terminology– Confidence– Support– k-itemset – a set of k items;– Large itemsets – the large itemset {A, B} corresponds to the

following rules (implications): A B and B A;

Page 77: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 77/99

Apriori AlgorithmApriori Algorithm

• The operator definition– n = 1: S2 = S1 S1 = {A}, {B}, {C}} {{A}, {B}, {C}} = {{AB},

{AC}, {BC}}

– n = k: Sk+1 = Sk Sk = {X Y| X, Y Sk, |X Y| = k-1}

– X and Y must have the same number of elements, and must have exactly k-1 identical elements;

– Every k-element subset of any resulting set element (an element is actually a k+1 element set) has to belong to the original set of itemsets;

Page 78: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 78/99

Apriori AlgorithmApriori Algorithm

• Example:

TID elements

10 A C D

20 B C E

30 A B C E

40 B E

Page 79: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 79/99

Apriori AlgorithmApriori Algorithm

• Step 1 – generate a candidate set of 1-itemsets C1

– Every possible 1-element set from the database is potentially a large itemset, because we don’t know the number of its appearances in the database in advance (á priori );

– The task adds up to identifying (counting) all the different elements in the database; every such element forms a 1-element candidate set;

– C1 = {{A}, {B}, {C}, {D}, {E}}

– Now, we are going to scan the entire database, to count the number of appearances for each one of these elements (i.e. one-element sets);

Page 80: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 80/99

Apriori AlgorithmApriori Algorithm

• Now, we are going to scan the entire database, to count the number of appearances for each one of these elements (i.e. one-element sets);

{A} 2

{B} 3

{C} 3

{D} 1

{E} 3

Page 81: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 81/99

Apriori AlgorithmApriori Algorithm

• Step 2 – generate a set of large 1-itemsets L1

– Each element in C1 with support that exceeds some adopted minimum support (for example 50%) becomes a member of L1;

– L1 = {{A}, {B}, {C},{E}} and we can omit D in further steps (if D doesn’t have enough support alone, there is no way it could satisfy requested support in a combination with some other element(s));

{A} 2

{B} 3

{C} 3

{D} 1

{E} 3

Page 82: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 82/99

Apriori AlgorithmApriori Algorithm

• Step 3 – generate a candidate set of large 2-itemsets, C2

– C2 = L1 L1 ={{AB}, {AC}, {AE}, {BC}, {BE}, {CE}}

– Count the corresponding appearances

• Step 4 – generate a set of large 2-itemsets, L2;

– Eliminate the candidates without minimum support;

– L2 = {{AC}, {BC}, {BE}, {CE}}{AB} 1

{AC} 2

{AE} 1

{BC} 2

{BE} 3

{CE} 2

Page 83: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 83/99

Apriori AlgorithmApriori Algorithm

• Step 5 (C3)

– C3 = L2 L2 = {{BCE}}

– Why not {ABC} and {ACE} – because their 2-element subsets {AB} and {AE} are not the elements of large 2-itemset set L2 (calculation is made according to the operator definition);

• Step 6 (L3)

– L3 = {{BCE}}, since {BCE} satisfies the required support of 50% (two appearances);

• There can be no further steps in this particular case, because L3 L3 = ;

• Answer = L1 L2 L3;

Page 84: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 84/99

Apriori AlgorithmApriori Algorithm

L1 = {large 1-itemsets}

for (k=2; Lk-1 ; k++)

Ck = apriori-gen(Lk-1);

forall transactions t D do begin Ct = subset (Ck, t);

forall candidates c Ct do

c.count++;

end;

Lk = {c Ck | c.count minsup}

end;

Answer = k Lk

Page 85: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 85/99

Apriori AlgorithmApriori Algorithm

• Enhancements to the basic algorithm• Scan-reduction

– The most time consuming operation in Apriori algorithm is the database scan; it is originally performed after each candidate set generation, to determine the frequency of each candidate in the database;

– Scan number reduction – counting candidates of multiple sizes in one pass;

– Rather than counting only candidates of size k in the kth pass, we can also calculate the candidates C’k+1, where C’k+1 is generated from Ck (instead Lk), using the operator;

Page 86: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 86/99

Apriori AlgorithmApriori Algorithm

– Compare: C’k+1 = Ck Ck Ck+1 = Lk Lk

– Note that C’k+1 Ck+1

– This variation can pay off in later passes, when the cost of counting and keeping in memory additional C’

k+1 - Ck+1 candidates becomes less than the cost of scanning the database;

– There has to be enough space in main memory for both Ck and C’

k+1;

– Following this idea, we can make further scan reduction:• C’k+1 is calculated from Ck for k > 1;

• There must be enough memory space for all Ck’s (k > 1);

– Consequently, only two database scans need to be performed (the first to determine L1, and the second to determine all the other

Lk’s);

Page 87: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 87/99

Apriori AlgorithmApriori Algorithm

• Abstraction levels– Higher level associations are stronger (more powerful), but also

less certain; – A good practice would be adopting different thresholds for

different abstraction levels (higher thresholds for higher levels of abstraction)

Page 88: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 88/99

DHP AlgorithmDHP Algorithm

• DHP = Direct Hashing and Pruning – another algorithm for mining association rules;

• Based on the Apriori algorithm (Ck/Lk generation in the kth step);

• Empirical analysis of the Apriori algorithm shows that candidate sets (Ck) are much larger than corresponding sets of large itemsets (Lk), especially in a first few iterations;

• DHP introduces more efficient candidate set generation method;

• The idea is to insert into Ck only those candidate sets that are

likely to become large itemsets;

Page 89: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 89/99

DHP AlgorithmDHP Algorithm

• Additional improvement is accomplished through “two-dimensional” search base reduction – “length”(number of records in the search base) and “width” (number of relevant attributes in a record);

• Large itemsets’ characteristics:– Every non-empty subset of a large itemset is a large itemset as

well, for example, {BCD} L3 {{BC}, {CD}, {BD}} L2;

– It implies that a record is relevant for discovering large k+1-itemsets only if it contains at least k+1 large k-itemsets;

Page 90: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 90/99

DHP AlgorithmDHP Algorithm

– During the Ck Lk phase we might count large k-itemsets in each record; if their number in a particular record is less than k+1, we omit that record during the Ck+1 generation;

– Similarly, if a record contains one or more large k+1-itemsets, each element (item) of these itemsets appears in, at least, k candidates from Ck

• Hashing– Hashing boosts the performance of the DHP algorithm;– The algorithm does not specify any hash function in particular, it

depends on the application;– Likewise, it does not specify the size of the hash table (number of

groups/addresses);

Page 91: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 91/99

DHP AlgorithmDHP Algorithm

• Application example

TID elements

10 A C D

20 B C E

30 A B C E

40 B E

Page 92: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 92/99

DHP AlgorithmDHP Algorithm

• Step 1 – generate a candidate set of 1-itemsets C1 – C1 = {{A}, {B}, {C}, {D}, {E}}

– Simultaneously with counting each element’s support, a hash tree is generated that contains all the elements from the database, in order to improve the counting performance;

• For each new element, DHP checks whether the element is already in the tree or not;

• If yes, DHP increments the current number of appearances for that element; otherwise, the element is added to the hash tree, and the number of its appearances is set to 1;

Page 93: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 93/99

DHP AlgorithmDHP Algorithm

• Having counted each C1 element appearances, all possible 2-element subsets are generated and inserted into H2 hash table;

– The address of a particular subset could be calculated with respect to the position of its elements in C1 candidate set, using chosen hash function h(x, y);

TID 2-element subsets

10 {AC}, {AD}, {CD}

20 {BC}, {BE}, {CE}

30 {AB}, {AC}, {AE}, {BC}, {BE}, {CE}

40 {BE}

Page 94: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 94/99

DHP AlgorithmDHP Algorithm

– For example, let’s adopt the following hash function: h({x y}) = (posC1(x)*10 + posC1(y)) mod 7;

• The corresponding H2 hash table is shown below:

address weight

0 3 {AD} {CE} {CE}

1 1 {AE}

2 2 {BC} {BC}

3 0

4 3 {BE} {BE} {BE}

5 1 {AB}

6 3 {AC} {CD} {AC}

Page 95: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 95/99

DHP AlgorithmDHP Algorithm

• Whenever a new element is added to the hash table, the weight of the particular address is increased by one;

• C2 is generated out of L1 (just like in Apriori case);

• Besides that, only those elements that map to the addresses whose weight is greater or equal than specified minimum support (let the minimum support be 50%), will be taken into consideration during the C2 generation;

• C2 = {{AC}, {BC}, {BE}, {CE}};

• It contains two elements less (!) than the C2 set generated by the Apriori algorithm for the same example database;

Page 96: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 96/99

DHP AlgorithmDHP Algorithm

• In general, the Hk hash table is used for the Ck candidate set generation in the kth step of the algorithm; Hk is created in the previous (k-1)th step;

• Each address of the Hk hash table contains a number of k-element subsets as elements; its weight denotes the number of elements;

• The fact that an address doesn’t satisfy minimum support requirement means that neither element (set) that is mapped to the address can satisfy the requirement alone all the elements (sets) at such Hk addresses are omitted for the Ck generation;

• During the kth step, Ck is generated starting from Lk-1, with the

restrictions described above;

Page 97: VIRTUAL  PRESENCE

Voislav Galić, Dušan Zečević,Đorđe Đurđević, Veljko Milutinović 97/99

DHP AlgorithmDHP Algorithm

• Conclusions:– DHP outperforms Apriori, for the same input data;

– The time spent for the hash tables generation (especially H2) is overcome by extremely reduced candidate sets (C2, …);

– The same improvements applied on Apriori, may as well be applied here (scan reduction, abstraction levels, …)

Page 99: VIRTUAL  PRESENCE

THE ENDTHE END

Quatenus nobis denegatum diu vivere, relinquamus aliquid, quo nos vixisse testemur

Authors: Voislav Galić, [email protected]šan Zečević, [email protected]Đorđe Đurđević, [email protected] Milutinović, [email protected]

http://galeb.etf.bg.ac.yu/~vm/tutorial