information quality assessment in the wiq-ei eu project
DESCRIPTION
http://www.dirinfo.unsl.edu.ar/noticias/articulo/charla-dra-elisabeth-lex-know-center-austria.htmlTRANSCRIPT
www.know-center.atNovember 17th, 2011
Information Quality in Social MediaPresentation at UNSL
Elisabeth Lex
2
www.know-center.at
Agenda
The Know-Center
The WIQ-EI project
Why Information Quality on the Web?
Selected Results
Conclusion
3
www.know-center.at
The Know Center – We are...
Austria’s Competence Center for Knowledge Management and Knowledge Technologies
Link between Science and Industry
A multi-disciplinary team of 40+ Scientists and Developers
Over 575 publications since 2001
100 Master theses, 26 Phd theses, 4 habilitations
Editors of 2 Journals: Journal of Universal Knowledge Management, Journal of Universal Computer Science
Organizer of the International Conference on Knowledge Management and Knowledge Technologies (I-KNOW)
4
www.know-center.at
The Know Center
2 Areas of Research:
Knowledge Relationship Discovery:
Detecting semantic entities, semantic relations in unstructured data
Cross-language and cross-domain search and retrieval Automatic analysis of information structure and quality User interfaces for visual analysis of large information
repositories
Knowledge Services:
Web 2.0, Collective Intelligence and Social Network Analysis Semantic Technologies, Semantic Web, Semantic Retrieval Communication and Collaboration Technologies Mobile Technologies
5
www.know-center.at
The WIQ-EI Project - Goals
Web Information Quality Evaluation Initiative
3 Objectives:
Development of Web Content Information Quality Measures
Plagiarism Detection and Authorship Attribution
Multilingual Opinion and Sentiment Mining
Derive algorithms, tools and test data sets
6
www.know-center.at
The WIQ-EI Project - Implementation
On a global scale:
Researcher exchanges between organisations from European (Austria, Germany, Spain, Greece) and non European countries with expertise in topic relevant fields (Argentina, Mexico, India)
Carry out research secondments, training and dissemination activites, challenges, workshops
7
www.know-center.at
Agenda
The Know-Center
Why Information Quality on the Web?
Selected Results
Conclusion
8
www.know-center.at
Introduction
On the Web - large amount of potentially useful content
Navigating is challenging
Web is changing: User Generated Content, Social Media
9
www.know-center.at
Introduction
On the Web - large amount of potentially useful content
Navigating is challenging
Web is changing: User Generated Content, Social Media
- Social media up to date- Wide audience, highly dynamic- Open to (almost) anyone- Powerful e.g. for media resonance analysis
10
www.know-center.at
Introduction
On the Web - large amount of potentially useful content
Navigating is challenging
Web is changing: User Generated Content, Social Media
- Social media up to date- Wide audience, highly dynamic- Open to (almost) anyone- Powerful e.g. for media resonance analysis
Information Quality of Social Media is questionable!
11
www.know-center.at
What is Information Quality?
A multi-dimensional concept [Klein, 2001]
Different Types of Information Quality (IQ) [Knight2005]
E.g. [Wang1996]:
Intrinsic IQ: Accuracy, Objectivity, Believability, Reputation
Accessibility IQ: Accessibility, Security
Contextual IQ: Relevancy, Value-Added, Timeliness, Completness, Amount of Information, Presence of Author information [Katerattanakul1999]
Representational IQ: Interpretability, Ease of Understanding, Concise Representation, Consistent Representation
12
www.know-center.at
Information Quality – Link to Information Retrieval, Data Mining
The Information Retrieval Process
13
www.know-center.at
Information Quality – Link to Information Retrieval, Text Mining
Text Mining
The Information Retrieval Process
14
www.know-center.at
Information Quality – Link to Information Retrieval, Data Mining
The Information Retrieval Process
Text Mining
Enables to retrieve core information from unstructured text
- Information Extraction
- Clustering
- ...
15
www.know-center.at
Information Quality – Link to Information Retrieval, Data Mining
Faceted Search
The Information Retrieval Process
Text Mining
Enables to retrieve core information from unstructured text
- Information Extraction
- Clustering
- ...
16
www.know-center.at
Information Quality – Link to Information Retrieval, Data Mining
Faceted Search
The Information Retrieval Process
Text Mining
17
www.know-center.at
Information Quality – Link to Information Retrieval, Data Mining
Faceted Search
IQ Dimensions:- Objectivity- Accuracy...
The Information Retrieval Process
Text Mining
18
www.know-center.at
Our work – Focus on Media Domain
Goal: Assess intrinsic Information Quality in social media, traditional media, arbitrary Web content
Several IQ dimensions:
Objectivity
Emotionality
Credibility
Readibility
Indepth versus Shallow
Expert versus Non-Expert
Personal versus Official
19
www.know-center.at
Agenda
The Know-Center
Why Information Quality in Media Domain?
Selected Results
Conclusion
20
www.know-center.at
ResultsInformation Quality Dimension: Objectivity
Task:
Objectivity Classification in Blogs
Use features based on style properties:
Dataset: Trec Blogs08 - 83 blogs, 12844 blog posts
Results:
Accuracy of 87% for Objectivity Classification in Blogs
21
www.know-center.at
ResultsInformation Quality Dimension: Credibility
Rank blogs by credibility
Compare blogs with credible source:
Quantity structure Content similarity: Nouns, Verbs+ Adjectives
Dataset: APA news articles, crawled blogs
Results:
Average precision of 83% for blog credibility ranking
Correlation between quantity structures of blogs and news
e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79
[Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.
22
www.know-center.at
ResultsWeb Genre and Quality Classification
ECML/PKDD Discovery Challenge 2010
Task 1: Web Genre and Quality Facets
News/Editorial, Educational, Discussion, Commercial, Personal/Leisure, Web Spam
Bias, Trustworthiness, Neutrality
Task 2: English Content Quality: Combination of Facets Quality Score
Task 3: Multilingual Content Quality: German, French
Dataset: English, German, French Web hosts: NLP Features, Content Features, Terms, Links
Approach: Ensemble Classifier Approach (J48, CFC, SVM)
23
www.know-center.at
Combined Quality Score
Use Case: Web Archival
24
www.know-center.at
ResultsWeb Genre and Quality Classification
Challenges:
Unbalanced and low quality training data (Training data contained also Hungarian, Czech,.. Hosts)
News and Educational hard to separate
Too few training data for German and French hosts
Results:
Methods performs best for Educational/Research (NDCG 0.688), Commercial (0.694), and Personal/Leisure (0.583)
English quality task: NDCG 0.844
Multilingual quality task: Use topic independent features from English hosts
German: NDCG 0.792 French: NDCG: 0.823
[Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.
25
www.know-center.at
Agenda
The Know-Center
Why Information Quality in Social Media?
Selected Results
Conclusion
26
www.know-center.at
ConclusionsSummary
Information Quality (IQ) consists of multiple dimensions
Depends on Use Case
BUT: Several dimensions are commonly agreed upon
IQ dimensions can be combined in one quality score
Supervised Classification often used to assess IQ
However, training data needed!
Simple style based features suited to assess IQ dimensions
27
www.know-center.at
Thank you for your attention!