ad hoc data integration for mobile gi s a pplications
DESCRIPTION
Ad hoc data integration for mobile GI S a pplications. Ramya Venkateswaran ([email protected]). Contents. Scenario Research Objective Introduction: Overview of the GenW2 project Motivation: Why is Ad hoc Data Integration needed? State of the Art - PowerPoint PPT PresentationTRANSCRIPT
Contents1. Scenario2. Research Objective3. Introduction: Overview of the GenW2 project4. Motivation: Why is Ad hoc Data Integration needed?5. State of the Art6. Research Questions: Discuss 3 research questions7. Methods: TourGuide and friends8. Next Steps: Data Enrichment and Quality control
2
Scenario1
Scenario of UsageI will be vacationing in Paris and I want to visit some of the famous palaces, History related places and other tourist locations in Paris
Other Sources ?
Recommendations from
People
Tourist Guides
Albums & Images
Tourist & Travel
Websites
Scenario of UsageI’d still like to go to Paris..
Other Sources ?
People
Tourist Guides
Albums & Images
Tourist & Travel
Websites
Tourguide
Recommendations from
Research Objective2
Objective of my research
Data Integration
•Flavour Based integration
• Ad hoc DI vs. Traditional DI
• TourGuide
Data enrichment
• POI Enrichment
• Website credibility
Data quality control
• Completeness
• Correctness
• Credibility
• User feedback
Ad hoc Data Integration
7
Overview and Introduction3
Overview of the GenW2 Project Short for: Generalization for portrayal in Web
and Wireless mapping
Develop new methods for web and wireless mapping
Focus on ad hoc integration of heterogeneous information on-the-fly map generalization in a mobile context.
9
The GenW2 Framework10
Web
Result
Internal Database
Information retrieval component
ParserRuleset & Association Component
Spatio-Temporal
Event handler
User
Privacy Controller and Firewall
VisualizationFilter &
Relevance Component
Genera-lization
Query
ParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data Integrator
Data sources
1
1
3
2
The GenW2 Framework11
Web
Result
Internal Database
Information retrieval component
ParserRuleset & Association Component
Spatio-Temporal
Event handler
User
Privacy Controller and Firewall
VisualizationFilter &
Relevance Component
Genera-lization
Query
ParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data Integrator
Data sources
1
1
3
2
The GenW2 Framework12
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
MRDBFacts
DB
Image metada
ta
Types of Data sourcesWebservice
s
13
Web pages
Staticdataset
s
Motivation - Why is Ad hoc Data Integration needed?4
Motivation So many data sources and so little structure
Web as a database – Too much information to ignore!
Ad hoc integration – Need based according to scenario and flavour, unlike search engines.
Importance of recording certain facts that can enrich the MRDB and the integration process.
15
State of the art5
Relevant DomainsRecommend
ation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
17
Ad hoc Data Integration
State of Art
Data Integration
•Flavour Based integration
• Ad hoc DI vs. Traditional DI
• TourGuide
Data enrichment
• POI Enrichment
• Website credibility
Data quality control
• Completeness
• Correctness
• Credibility
• User feedback
Ad hoc Data Integration
18
Integration, IR and decision systems Different concepts and methods in Data
Integration Data Integration from multiple sources Geospatial data mining and integration. (Knoblock
et al. 2001, Michalowski et al., 2004) Mashup web data for overall importance of
landmarks. (Grabler et al., 2008) SPIRIT – Design, techniques and implementation
(Purves et al., 2007, Jones et al., 2002, Bucher et al., 2005) Geo parsing, geo coding and IR techniques (Clough
et al., 2005)
19
Integration, IR and decision systems Methods for marking tourist locations and
a guide that is 'context aware'. (Abowd et al., 2004)
Activity based model of decisions that are affected based on activity-travel behavior and also predict the activities. (Arentze and Timmermans, 2004)
Voluntary information from a community, collaborative semantics, recommendation systems (Schlieder , 2007)
20
Data Enrichment Methods and algorithms for the provision
of auxiliary data and its use for controlling an automated adaptive generalization process (Neun, 2007)
21
Data quality and assessment Framework for efficient and accurate
integration of geospatial data from a large number of sources
Positional accuracy, completeness (Thakker et al., 2007)
VGI (Volunteered Geographic Information) Trust models for Gazetteers (Keßler et al., 2009)
22
Observations from literature Considerable work and methods for traditional
data integration, variety of methods in IR and GIR
Lesser work and methods for data integration from multiple and dynamic sources (Focus on semantics rather than data and context) and recording reusable facts.
Considerable work on user modeling, activities and activity recommendation
Data enrichment work for improving generalization
23
Challenges Datasets are not static and are dynamic
and heterogeneous Auxiliary data Determining parameters (user
categories, activities habits etc, not a single user or set of preferences)
Point of complete integration Methods to test and evaluate the
effectiveness
24
?Research Questions6
RQ1 – Flavour Based Integration
Given an activity and unrelated data that is heterogeneous and dynamic, what is an effective method of data integration, so that the results are streamlined towards information about events and places for a set of users? Flavour based data integration from various
sources Ad hoc DI vs. Traditional DI Tour guide – An example of web data integration
26
RQ2 – Data Enrichment How can the Generalization for portrayal
in Web and Wireless mapping (GenW2) framework record and exploit valuable reusable information, obtained from the preceding data integration? Facts DB Activity-Location pairs Data source credibility (Keßler et al., 2009) User feedback
27
RQ3 – Quality of data What are the different metrics that can be
used to control and/or assess the quality of the integrated data? Measurement of Quality?
Quality of data by completeness (Thakkar et al., 2007)
Quality of data by correctness (Thakkar et al., 2007) Another metric for Quality Assessment
Quality of data by collective user feedback Credibility rank of information sources (Keßler et al.,
2009) Evaluation Methodology
28
Methods7
Flavour Based Data Integration
Recommendation
SystemsInformation
FilteringInformation
RetrievalCollaborative
Filtering
30
Definition - Flavour Based Data Integration
Recommendation
SystemsInformation
FilteringInformation
RetrievalCollaborative
Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
31
Definition - Flavour Based Data Integration
Recommendation
SystemsInformation
FilteringInformation
RetrievalCollaborative
Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
32
Recommendation
SystemsInformation
FilteringInformation
RetrievalCollaborative
Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
33
Definition - Flavour Based Data Integration
Recommendation
SystemsInformation
FilteringInformation
RetrievalCollaborative
Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
34
Definition - Flavour Based Data Integration
Recommendation
SystemsInformation
FilteringInformation
RetrievalCollaborative
Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
35
Definition - Flavour Based Data Integration
Flavour Based Data Integration
Recommendation
SystemsInformation
FilteringInformation
RetrievalCollaborative
Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
36
Keyphrases in FBDI Systematic approach to extracting
information Obtain information from one or many
knowledge resource/s Recommendations for user groups or
user categories Opinions of a community of users Keyword, flavour or activity such as tourism,
history, sport, culture, shopping etc
37
Definition of FBDI FBDI is an activity based, systematic
approach to extract and integrate information from multiple knowledge sources depending on habits of certain user groups or user categories, capable of learning over time.
Flavour = typical activities of a certain user group
Examples – Tourism, Shopping, Sports, Historical excursions, Cultural excursions etc
38
The GenW2 Framework40
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
The GenW2 Framework41
Adaptive tour guide for Paris Flavour Based Integration with web as
datasource Only web as the
database (Grabler et al.,2008 )
Integration of data on Tourism Transport User feedback User Rating Facebook profile Dopplr profile
Scheduler
42
Data Integrator Example of web data integration Functional components (Baumgartner et
al., 2009) Web interaction component
Lonelyplanet, wikitravel, virtualtourist, tripadvisor and official tourist website
Wrapper generator OpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
43
The GenW2 Framework44
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Data Integrator Example of web data integration Functional components (Baumgartner et
al., 2009) Web interaction component
Lonelyplanet, wikitravel, virtualtourist, tripadvisor and official tourist website
Wrapper generator OpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
45
The GenW2 Framework46
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Web data Extraction Semi automatic wrappers
Automatic wrapper Induction WIEN (Kushmerick et al., 1997) Stalker (Muslea et al., 2001) DEBye (Laender et al., 2000)
47
Academic XWARP (Liu et al., 2000) Lixto (Baumgartner et al.,
2001) Wargo (Pan et al., 2002)
Commercial RoboMaker
(Kapow Technologies) WebQL
(QL2 Software Inc.)
Data Integrator Example of web data integration Functional components (Baumgartner et
al., 2009) Web interaction component
Lonelyplanet, wikitravel, virtualtourist, tripadvisor and official tourist website
Wrapper generator OpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
48
The GenW2 Framework49
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Data Integrator Example of web data integration Google as a first part of integration Second Part - Functional components
(Baumgartner et al., 2009) Web interaction component
lonelyplanet, wikitravel, virtualtourist, tripadvisor and official tourist website
Wrapper generator OpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
50
The GenW2 Framework51
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Intelligent Ranker and Scheduler Third step of integration. Applies different profiles to the data, like
Facebook and Dopplr. Arranges the data in a ranked form depending
on matches from user interests and activities. Brute force cumulative ranking algorithm
3 – Explicitly mentioned 2 – Description match 1 – Suggested by other users
Merges data from public transport
52
The GenW2 Framework53
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Facts DB Location information from the MRDB and
map LOD with place Activity Location pairs Fact DB structure
54
Facts DB Structure High Level Structure
Lower level structure – Database Object maps to more locations
Limit to two levels Inverse Page Lookup
55
Activity
LocationFrom
LocationTo
Name Rank
User Feedback
Shopping
47°22′40″N, 8°32′25″E
47.3671°N , 8.5409°E
Bahnhofstrasse
3 Shop for watches, jewelry, clothes
Database Object
Data Quality Evaluation through completeness and
correctness Example : Shopping stores in Bahnofstrasse
Extract lat-lng Shop name, website, details and contact details Shop opening and closing times Evaluate against manually collected data for
completeness and correctness.
56
Next steps8
Next Steps Formalizing parameters and methods for
integration (Link) Improve scoring algorithm for places Structure of Facts DB for efficient
storage and retrieval Develop on quality control methods like
considering user feedback and credibility
58
Open Questions At what point is the data integrated? When is it complete? Qualitative vs.
Quantitative Error recovery and correction
mechanism in FactsDB? Mapping of place’s score to LOD?
59
Fall 2008Year 1Spring2009
Fall 2009Year 2Spring 2010
Fall 2010Year 3 Spring 2011
• Literature review
• Develop overall framework
• Start to develop research questions and focus area.
• Literature review
• Develop research questions
• Define use cases
• Make a prototype of one use case - TourGuide
• Develop concept and methods for RQ1
• Implement parts of TourGuide
• Develop user tests for input to RQ2 and RQ3
• Continue work on RQ1. Formalise parameters.
• Analyse input from user tests and combine with other parameters for RQ2
• Continue work with RQ2 and start RQ3
• Formalise parameters for data quality control
• Perform evaluation of data, define and implement quality assessing/controlling parameters for FBDI
• Finalize publications
• Thesis write-up
Milestones60
Summary: Expected contributions Working system and framework for ad hoc data
integration, that will work for certain flavours Methodology of Flavour based data integration
(RQ1) Structure Algorithm for efficient data source selection depending on “flavour” Algorithm for scoring different places depending on number of parameters.
Concept and structure of FactsDB that will work with data from the MRDB for enrichment (RQ2)
Improved and adapted parameters and a mechanism for checking the quality of the integrated data and some test cases (RQ3)
61
The GenW2 Framework62
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
Data Integrator
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
Thank you! Questions/Feedback?Ramya Venkateswaran ([email protected])
Demo and slides at http://www.geo.uzh.ch/~ramya/k
olloquium/
63