designing and implementing search solutions
TRANSCRIPT
© FINDWISE 2012
Implementing and designing search solutions
Gothenburg University – Gothenburg – 2012-03-08
• Founded in 2005
• Offices in Sweden, Denmark, Norway and Poland
• 72 employees (February 2012)
• Our objective is to be a leading provider of Findability solutions utilising the full potential of search technology to create customer business value
About Findwise
Technology independent
Creating search-driven Findability solutions based on market-leading commercial and open source search technology platforms:
Autonomy IDOL Microsoft (SharePoint and FAST Search products) Google GSA IBM ICA/OmniFind LucidWorks Apache Lucene/Solr (Open source) and more…
Findability Challenges
Employee productivity (DN article, March 2011):
”The effort to find the right information costs an average company 80,000 SEK per employee and year”
Customer Service quality and efficiency (Accenture report, March 2011):
“69% of agents don't have answers to help service customers”
E-commerce conversion rate (Google survey, December 2010):
“77% of those surveyed used search within an e-commerce website to find products”
Search core - overview
Title: Brown foxContent: The quick brown fox jumps over the lazy dogAuthor: Tobias Berg
Documents
Title: My dogContent: My old dog cannot jump anymoreAuthor: Svetoslav Marinov
Term Documents
… …
fox 1
jump 1,2
lazy 1
dog 1,2
tobias 1
berg 1
… …
Inverted index
TokenizationStemmingStop-word…
Relevancy
Retrieveddocuments Relevant
documents
•Precision – how many of the retrieved documents are relevant?
•Recall – how many of the relevant documents were retrieved?
Relevancy
Recall find everything related to the query
- lemmatization- synonyms- wildcards- anti-phrasing- or-operator
Precision find only entities related to the query
- exact word matching- exact phrase
matching- and-operator
GoalImprove precision,
without sacrificing recall
Search core – relevance score
•TF/IDF
•Field length
•Field weight• Title *2
• Author *4
• Content *1
•Freshness•…
Search Core
•Optimized for full-text search
•Sub-second responses
•Tunable relevance
•Scalable
•Configurable & Extendable
{query}
Find matching documents
Score documents
{result}
Connectors – fetch data
Database connector
Id Product name
Description Price
1 Wheel Makes the bus go round round round
45
2 Window A shield of glass
12
Id Book name Abstract Author
1 Ulysses Irish novel James Joyce
2 Crime and Punishment
Russion novel
Dostoevsky, Fyodor
Database connector
Connector framework – code example
public void execute() {//Insert code to fetch content
}
public void interrupt() {//Insert code to handle interrupt signal
} public void init() {
//Insert code to initialize connnector }
Connector Frameworks
http://incubator.apache.org/connectors/
http://code.google.com/p/google-enterprise-connector-manager/
• Existing connectors• Re-usable• Configuration interfaces• Standardized implementation
Pipeline - overview
• PDF/Office -> Text• Lemmatization• Language identification• NER• Phonetic search• Keyword extraction• External calls• …
Pipeline framework – code example
protected void addAction(Document doc) throws PipelineException {//Insert codedoc.addField(“Title”,”Hello world!”);
}
protected void updateAction(Document doc) throws PipelineException {//Insert code addAction(item);
} protected void deleteAction(Document doc) throws PipelineException {
//Insert code }
NLP tools and approaches
• Open source:GATE, OpenNLP, UIMA, StanfordNLP, Mallet, Apache Mahout
• Proprietary:
IBM LanguageWare• Own components:
e.g. KeywordExtraction Service; LanguageIdentify• POS taggers – Hunpos, OpenNLP, Mallet• Dependency Parsers – MaltParser, StanfordParser• NER – rule-based + statistical models• Document summarization• Document clustering
Pipeline frameworks
Findwise Hydra
http://www.pypes.org/
http://www.openpipeline.com/
• Re-usable stages• Configuration interface• Focus on task
What the frell is UX design?
• Interaction design
•Usability Engineering
• Information Architecture
•Visual Design
Findwise UX design principles
Users want results
Dialogue not monologue
Participation builds trust
Answer frequent questions
Simple but powerful
Findwise UX design principles
Users want results
Dialogue not monologue
Participation builds trust
Answer frequent questions
Simple but powerful
Design research
•Be easy to reach – keep contact
•Let users requests guide you when prioritizing new features
•Listen & try to discover the underlying problem
•Try to find out what the user needs not what they say they want
Usability tests
•Test early - test often
•Use sketches, paper prototypes, static prototypes and working prototypes!
•Create real tasks or problems
•Don’t ask them how they would want it
•Test on friends and family or colleagues
Summary
•Listen & try to discover the underlying problem
•Search analytics – Top queries
•Do usability tests early & often
• Iterate!
Research
•Collaboration with Universities
GU, Borås, KTH, Copenhangen U.
•EU projects
RUSHES
• Master’s Thesis supervision
Chalmers, KTH, Lund
Master’s Thesis projects
•A way to test ideas
•A way to recruit people
•A way to cooperate with Universities
•Keyword Extraction
•Document Clustering
•NER
•Document summarization
•Extracting structural information from text
•Query log analysis
Resources - books
•The design of everyday things
•Don’t make me think
•Search analytics for your site
•ManifoldCF in Action
•Taming Text
Shameless plug
twitter.com/findwise
slideshare.net/findwise
findabilityblog.se
findwise.com
Tobias Berg
Björn Klockljung Johansson
Svetoslav Marinov
Thanks!