![Page 1: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/1.jpg)
SYMPOSIUM ON BIAS AND DIVERSITY IN IR
A TESTBED FOR DIVERSIFICATON IN SEARCH
Koblenz, August 31, 2011Michael Matthews, Barcelona Media/Yahoo! Research
1
![Page 2: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/2.jpg)
OVERVIEW
Introduction to LivingKnowledge Testbed – The Diversity EngineGetting started – Our first application!Adding text analysisAdding multimedia analysisEvaluationIndexing and searchDeveloping applicationsFuture work
2
![Page 3: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/3.jpg)
DIVERSITY ENGINE• Provide collections, annotation tools and an
evaluation framework to allow for collaborative and comparable research
• Supports indexing and searching on a wide variety of document annotations including entities, bias, trust, polarity, and multimedia features
• Support development of bias and diversity aware applications
![Page 4: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/4.jpg)
ARCHITECTURE
DocumentCollections
AnalysisPipeline
Index/Search
ApplicationDevelopment
• Prediction of Community Acceptance• Sentiment in Comments Comment Ratings• Polarizing Videos Distribution of Ratings• Topic of Videos Distribution of Ratings
Yahoo! News
ARC Crawls
NYT
Evaluation Framework
![Page 5: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/5.jpg)
DESIGN DECISIONS
Use Open Source tools when availableProgramming Language - Java 1.6Data format – LK XMLAnalysis tools Operating System – Linux (any software language)Indexing/Search - SolrGUI – JSP, HTML, JavaScript, CSS
5
![Page 6: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/6.jpg)
LK-XML format.
![Page 7: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/7.jpg)
DOCUMENT COLLECTIONS
Supported Formats -ARC (Internet Memory Crawls) ,Text, HTML. Kyoto, BBN, NYTCollections
Testing Examples included with Diversity EngineLarge ARCs available from Internet MemoryConverters provided for other collections (MPQA, BBN, NYT) that have licensing restrictions
7
![Page 8: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/8.jpg)
ANALYSIS MODULES
8
Image Annotation Processing
Image Processing Text Processing
Text Annotation Processing
Face Detection
Naturalness
Colourfulness
SIFT Features
City/Landscape
Tone
Photomontage
Face Tampering
Photo/Cartoon/CG Annotations
SentimentHistogram
Sentence Subjectivity
Syntax & Semantics
POS
OpenNLP Entities
SuperSense Tagger
Vector Quantisation
Dictionary
Phrases
Quotes
Disambiguated Entities
Document Layout
RDFa Injection
Readability4J
TimeML
Statements
Subjective Expressions
URLs
Wikipedia People
Wikipedia Places
EXIF Image Clustering
![Page 9: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/9.jpg)
INDEXING/SEARCH
SolrEnterprise search platform built on top of LuceneXml input and output allows for easy integration with Diversity EnginePlug-in framework allows customizationBuilt-in facet capabilities support indexing and searching on annotations
IntegrationConverter from LK XML – Solr XMLPlug-in for facet ranking and speed improvements
9
![Page 10: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/10.jpg)
APPLICATION DEVELOPMENT
10
• Basis for LivingKnowledge Applications– Future Predictor– Media Content Analysis
• Support development – coding required!• Real World Problems
– HTML Extraction– Scaling to Large Collections– Provenance– Some pluggable GUI components– Examples to ease learning curve
![Page 11: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/11.jpg)
APPLICATION DEVELOPMENT
11
![Page 12: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/12.jpg)
APPLICATION DEVELOPMENT
12
![Page 13: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/13.jpg)
EVALUATION FRAMEWORK
• Framework for the evaluation of analysis tools
• Evaluates any possible annotation pipeline
• Measures correctness and quality• Outputs Precision + Recall• Compares annotation output of pipeline
with ground truth data
13
![Page 14: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/14.jpg)
OUR FIRST APPLICATION
Download Diversity Engine release from SourceForge tar xzvf [release file]cd testbedant buildapps/testbed conf/testbed/tutorial-application.xmlWhat happened?
197 text files and 127 images files converted from arc format to LK XML and stored in devapps/example/data/lkxml2 annotators were run over collection
OpenNLP for tokenization, sentence splitting, Pos tagsSST named entity recognizerResults stored in devapps/example/data/lkxml
Files were converted to Solr xml format and indexed using solrSolr XML stored to devapps/example/data/solr
HTML Visualization Files stored in devapps/example/data/htmlant deploy-testbed
Solr running at http://localthost:8983/solr/Example app running at http://localhost:8983/testbed/
14
![Page 15: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/15.jpg)
EXAMPLE SOLR OUTPUT
15
http://localhost:8983/solr/select/?q=putin
![Page 16: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/16.jpg)
EXAMPLE APPLICATION
16
http://localhost:8983/testbed/results.jsp?query=putin
![Page 17: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/17.jpg)
EXAMPLE DOCUMENT
17
![Page 18: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/18.jpg)
CONFIGURATION FILE
18
<lk-application logDir="log" appDir="devapps/example"><corpus dir="corpora/examples/smallarc" format="arc"/><image-pipeline>
<annotators></annotators>
</image-pipeline><pipeline>
<annotators><annotator exec="./opennlp"/><annotator exec="./sst"/>
</annotators></pipeline><visualize/><indexer solrHomeDir="solr/solr“
solrDataDir="solr/solr/data“converter="conf/testbed/tutorial-lk2solr.xml"/>
<searcher appTitle="LivingKnowledge - Example Application" appShortTitle="Example Application" appUrl="http://localhost:8983/solr/">
<facets><facet field="per"
description="Person"/><facet field="loc"
description="Location"/></facets></searcher>
</lk-application>
![Page 19: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/19.jpg)
TEXT ANALYSIS
19
<pipeline><annotators>
<annotator exec="./opennlp"/><annotator exec="./sst"/>
</annotators></pipeline>
<pipeline><annotators>
<annotator exec="./opennlp"/><annotator exec="./sst"/><annotator exec="./facts"/><annotator exec="./unitn_tagger"/><annotator exec="./unitn_subjexpr"/>
</annotators></pipeline>
apps/testbed –run pipeline conf/testbed/tutorial-application.xmlapps/testbed –run visualization conf/testbed/tutorial-application.xml
![Page 20: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/20.jpg)
TEXT ANALYSIS - FACTS
20
devapps/example/data/lkxml/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.facts.xml
![Page 21: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/21.jpg)
TEXT ANALYSIS - FACTS
21
devapps/example/data/html/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.html
![Page 22: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/22.jpg)
<pipeline><annotators>
<annotator exec="./opennlp"/><annotator exec="./sst"/><annotator exec="./facts"/><annotator exec="./unitn_tagger"/><annotator exec="./unitn_subjexpr"/><annotator exec="./imageannots"/>
</annotators></pipeline>
IMAGE ANALYSIS
22
<image-pipeline><annotators>
<annotator exec="./soton_haarfacedetector"/>
</annotators></pipeline>
apps/testbed –run pipeline,image-pipeline –pipeline imageannots conf/testbed/tutorial-application.xml
ls devapps/example/data/lkxml/img/*
![Page 23: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/23.jpg)
ANALYSIS API
Documents in LK XML format Annotators passed a single document directory –They should add annotations for each document in directoryFiles will have consistent naming convention
LkText file = id + “.lktext.xml”LkMedia = id + “.lkmedia.xml”LkAnnotation = id + “.” + annotatorId + “.xml”
Annotators will be processed sequentially in the order listed in the XML fileAnnotators can be written in any language but must run on Linux – Helper classes will exist for Java, but there is no obligation to use them.Add application calling your new annotator to apps directoryAdd your application to the configuration file as before
23
![Page 24: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/24.jpg)
ANALYSIS API – JAVA
Extend class org.diversityengine.annotator.AbstractAnnotatorImplement Methods
getName()getType() - TEXT OR IMAGE
For Image Analysis implementLkAnnotation getLkAnnotation(ImageDocument document)
For Text Analysis implementLkAnnotation getLkAnnotation(TextDocument document)
In main, instantiate and call annotatorNewAnnotator annotator = new NewAnnotator()annotator.processDirectory(args[0]);
Add application calling your new annotator to apps directoryAdd your application to the configuration file as before
24
![Page 25: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/25.jpg)
EVALUATION
25
<lk-application logDir="log" appDir="devapps/evaluation"><corpus dir="corpora/evaluation/sst/text/"
format="bbn"/><pipeline>
<annotators><annotator exec="./sst"/>
</annotators></pipeline><evaluation evalDir="evaluation/sst/">
<evaluator provides="ENTITIES" goldDir="corpora/evaluation/sst/gold/" goldAnnotator="sstgold" annotator="sst" />
</evaluation></lk-application>
Evaluation works with same configuration file. Simply add evaluation element
apps/testbed conf/evaluation/sst.xml
![Page 26: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/26.jpg)
EVALUATION RESULTS
26
<evaluation goldDir="/home/mikemat/code/livingknowledge/WP6/testbed/corpora/evaluation/sst/gold/" lkDir="/home/mikemat/code/livingknowledge/WP6/testbed/devapps/evaluation/data/lkxml" annotation="sst" goldAnnotation="sstgold" provides="ENTITIES"> <docs> <doc id="WSJ0375" N="19" tp="18" fp="1" fn="1" /> <doc id="WSJ0380" N="19" tp="15" fp="4" fn="1" /> <doc id="WSJ0376" N="72" tp="61" fp="11" fn="7" /> <doc id="WSJ0377" N="26" tp="17" fp="9" fn="6" /> <doc id="WSJ0378" N="10" tp="10" fp="0" fn="0" /> <doc id="WSJ0379" N="24" tp="19" fp="5" fn="2" /> </docs> <totals N="170" tp="140" fp="30" fn="17" p="0.8235294117647058" r="0.89171974522293" f="0.8562691131498471" /></evaluation>
cat evaluation/sst/sst.ENTITIES.xml
![Page 27: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/27.jpg)
INDEXING AND SEARCH
Search Engines - TraditionalBag-of-words representationInverted index (words -> documents) for efficiency10 docs ranked according tf-idf similarity with query
Search Engines – TodayMuch metadata associated with documentsRanking based on 100s of features (date, location, pagerank, click data, etc, personalization)Richer display
Facets for exploratory searchAnswers when appropriateetc..
Many open source options - Lucene/Solr most widely used
27
![Page 29: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/29.jpg)
FACETED SEARCH
29Diagram by Yonik Seeley
![Page 30: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/30.jpg)
FACETED SEACH
30
• Summarize query results aggregation properties of returned pages– price ranges for product query– related people or locations for news query
• Exploratory Search– Show documents that matching the query term and a selected
facet– Make inferences not clear from simple document list
• Living Knowledge Analysis is modeled very well by facets– Topics as determined by entity and fact extraction– Location and Time diversity dimensions– Opinions as determined by opinion extraction
![Page 31: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/31.jpg)
LK XML TO SOLR
31
• Solr has well defined XML input format for adding new documents
• Diversity Engine provides a simple language to map LX XML to Solr XML
![Page 32: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/32.jpg)
LK2SOLR CONVERSION
32
<lktosolr><field solr="per" annotation="ENTITIES_CLEAN" value="$text“
filter="org.diversityengine.solr.converter.filters.PerValueFilter"/><field solr="loc" annotation="ENTITIES_CLEAN" value="$text“
filter="org.diversityengine.solr.converter.filters.LocValueFilter"/><field solr="keywords" annotation="TOP_ENTITIES" value="$text" /><field solr="pubdate" annotation="metainfo:lktext" value="date“
type="date"/></lktosolr>
solr – Name of the field in solrannotation – Name of the LKXML Annotationvalue – Value of annotationfilter – Allows post processing on annotationtype – Only Date supported currently
<indexer solrHomeDir="solr/solr“solrDataDir="solr/solr/data“converter="conf/testbed/tutorial-lk2solr.xml"/>
![Page 33: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/33.jpg)
ADDING FACTS TO INDEX
33
<lktosolr><field solr="per" annotation="ENTITIES_CLEAN" value="$text“
filter="org.diversityengine.solr.converter.filters.PerValueFilter"/><field solr="loc" annotation="ENTITIES_CLEAN" value="$text“
filter="org.diversityengine.solr.converter.filters.LocValueFilter"/><field solr="keywords" annotation="TOP_ENTITIES" value="$text" /><field solr="pubdate" annotation="metainfo:lktext" value="date“
type="date"/><field solr="yago" annotation="yago-entities" value="$text" /><field solr="yago-country" annotation="facts"
value="xpath:/entity-information[facts/type/text()= 'wordnet_country_108544813']/id/text()" />
</lktosolr>
apps/testbed –run convert-solr conf/testbed/tutorial-application.xmlls devapps/example/data/solr/*
apps/testbed –run index conf/testbed/tutorial-application.xml
![Page 34: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/34.jpg)
FACTS TO SOLR
34
<field solr="yago" annotation="yago-entities" value="$text" />
![Page 35: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/35.jpg)
FACTS TO SOLR
35
<field solr="yago-country" annotation="facts" value="xpath:/entity-information[facts/type/text()=
'wordnet_country_108544813']/id/text()" />
![Page 36: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/36.jpg)
ADDING IMAGES TO INDEX
36
<lktosolr><field solr="per" annotation="ENTITIES_CLEAN" value="$text“
filter="org.diversityengine.solr.converter.filters.PerValueFilter"/><field solr="loc" annotation="ENTITIES_CLEAN" value="$text“
filter="org.diversityengine.solr.converter.filters.LocValueFilter"/><field solr="keywords" annotation="TOP_ENTITIES" value="$text" /><field solr="yago" annotation="yago-entities" value="$text" /><field solr="yago-country" annotation="facts"
value="xpath:/entityinformation[facts/type/text()='wordnet_country_108544813']/id/text()" />
<field solr="pubdate" annotation="metainfo:lktext" value="date“type="date"/><field solr="image" annotation="IMAGE_ANNOTS" value="$text" /><field solr="bestimage" annotation="BEST_IMAGES" value="$text" />
</lktosolr>
apps/testbed –run convert-solr conf/testbed/tutorial-application.xmlls devapps/example/data/solr/*
apps/testbed –run index conf/testbed/tutorial-application.xml
![Page 37: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/37.jpg)
APPLICATION DEVELOPMENT
ExamplesHTML ExtractionScaling to Large CollectionsProvenanceSome pluggable GUI components
37
![Page 38: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/38.jpg)
FACT/IMAGE APPLICATION
38
<searcher appTitle="LivingKnowledge - Example Application" appShortTitle="Example Application" appUrl="http://localhost:8983/solr/">
<facets><facet field=“yago" description=“Yago"/>
<facet field=“yago-country" description=“Country"/>
<facet field="per" description="Person"/><facet field="loc" description="Location"/>
<facet field=“image" description=“Images"/> </facets></searcher>
ant deploy-testbed
![Page 39: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/39.jpg)
FACT/IMAGE APPLICATION
39
http://localhost:8983/testbed/results.jsp?query=putin
![Page 40: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/40.jpg)
OPINION APPLICATIONOpinions are at sentence level, not document level – same analysis, but different indexingcat conf/testbed/tutorial-lk2solr-sentence.xml
40
<lktosolr solrDoc="SENTENCES" contextSize="1"><field solr="per" annotation="ENTITIES_CLEAN" value="$text“
filter="org.diversityengine.solr.converter.filters.PerValueFilter“source="solrdoc" />
<field solr="loc" annotation="ENTITIES_CLEAN" value="$text“filter="org.diversityengine.solr.converter.filters.LocValueFilter“source="solrdoc" />
<field solr="keywords" annotation="TOP_ENTITIES" value="$text" /><field solr="yago" annotation="yago-entities" value="$text“
source="solrdoc" /><field solr="image" annotation="IMAGE_ANNOTS" value="$text" /><field solr="bestimage" annotation="BEST_IMAGES" value="$text" /><field solr="pubdate" annotation="metainfo:lktext" value="date“
type="date"/><field solr="polarity"
annotation="MPQA-expressive-subjectivity,MPQA-direct-subjective“value="xpath:/node()[@pol]/@pol" source="solrdoc“filter="org.diversityengine.solr.converter.filters.PolarityValueFilter"/>
<field solr="pol-int“annotation="MPQA-expressive-subjectivity,MPQA-direct-subjective“value="xpath:concat(/node()[@pol and @int]/@pol,/node()[@int and @pol]/@int)“source="solrdoc"/>
</lktosolr>
apps/testbed –run convert-solr,index conf/testbed/tutorial-application-sentence.xml
ls devapps/example/data/solr/*
![Page 41: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/41.jpg)
SOLR XML – SENTENCE
41
![Page 42: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/42.jpg)
OPINION APPLICATION
42
<web-app xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version="2.5">
<description> LivingKnowledge Testbed Example Application </description> <display-name>Testbed Examples</display-name>
<context-param><param-name>applicationDef</param-name>
<param-value>conf/testbed/tutorial-application-sentence.xml</param-value>
<description>The Living Knowledge application description XML file </description> </context-param>
</web-app>
ant deploy-testbed
modify webapp\WEB-INF\web.xml
![Page 43: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/43.jpg)
OPINION APPLICATION
43
http://localhost:8983/testbed/results.jsp?query=putin
![Page 44: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/44.jpg)
HTML EXTRACTION
44
Main Article Other StuffHeadline
![Page 45: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/45.jpg)
HTML EXTRACTION
Boilerplate can lead to false positive results and inaccurate facet aggregation
Real example – before extraction developed, most common person for most queries was in a top story title (on all pages) the day of the crawl!
Titles, Authors and Dates are important for bias and diversity aware search
45
![Page 46: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/46.jpg)
PROVENANCE
How an annotation is derived is often as important as the annotation itself
Users want to verify resultsDevelopers need to validate results
Open Provenance provides an open source solutionTestbed annotations can be extended with Open Provenance chains
46
![Page 47: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/47.jpg)
PROVENANCE DIAGRAM
47
![Page 48: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/48.jpg)
SCALING TO LARGE COLLECTIONS
In the real world, even “small” datasets have million of documentsNLP/Image processing is expensive – 1 doc/sec = 11 days for 1 million docs!Hadoop Mapper allows for scaling – scales linearly with number of machinesZipCollection writer allows partitioning data into subsets for processing
48
![Page 49: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/49.jpg)
COMPONENTS- OPINIONS
49
![Page 50: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/50.jpg)
COMPONENTS - TIME
50
![Page 51: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/51.jpg)
COMPONENTS - GEO
51
![Page 52: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/52.jpg)
FUTURE WORK
More components Maven to manage dependenciesBetter integration of Timeline and Geo visualization componentsIntegration of ranking algorithmsBetter Documentation
52
![Page 53: SYMPOSIUM ON BIAS AND DIVERSITY IN IR A TESTBED FOR DIVERSIFICATON IN SEARCH](https://reader035.vdocuments.us/reader035/viewer/2022070423/56816782550346895ddc9157/html5/thumbnails/53.jpg)
THANKS!
LivingKnowledge Partners!You for coming!!Questions?
53