Download - Introduction to Enterprise Search
INTRODUCTION TO ENTERPRISE SEARCH
Kristian Norling
• Who is here?
• Your expectations?
• Kristian?
• 2 hours, one break
• Lifetime answer Guarantee on this class
Introduction
• Problem
• History of (web) search
• How we search and !nd?
• Current state of Enterprise Search + stats
• Technical concept
• Information quality
• Feedback cycle
• Five dimensions of Findability
Agenda
•List
mrflip
nathansnider
• Growing amounts of Information
• Changing patterns of information consumption
• Information silos
• Web like behaviour > Information !lters
• Internal information use is still in the Digital Stone Age
The Problems
In Academia search is called Information Retrieval.
It is an old discipline, dating back thousands of years...
Basic concepts in Information Retrieval:
Recall and Precision, more later...
History of Search
• Directories are manually compiled taxonomies of websites
• Directories are far more costly and time intensive to maintain
• Directories lack coverage, although it provides an important alternative, especially for novice surfers
• Search engines rely mainly on automated search algorithms
• Search engines rank pages by popularity on the web, the more referrals (links) the more relevant
Directories vs. Search Engines
Yahoo – searchable directory (1994, ~10000 websites)
• Integrates search over its directory. Organized by subject ma8ers. Sites can be suggested, but human editors control quality of directory (~100 dedicated editors)
Ask – natural language search engine (1998)
•used human editors to match popular queries. Tried different algorithms to rank pages by popularity
Google – searchable index (1998)
•Developed Pagerank, popularity algorithm that hides bad content. Set standards (spellchecking, query suggesIon, search results page design)
Early days of Web Search
First generation (1995-97) – AltaVista, Excite, WebCrawler
Uses mostly on-page data (text and formatting).
Informational queries.
Second generation (1998-2010) – Google, Yahoo
Use o"-page, web-speci!c data: link analysis, anchor-text, click-through data. Informational and navigational queries.
Third generation (2010-present) – Google, Wolfram-Alpha, Bing
Blend data from many sources, tries to answer ‘‘the need behind the query’’: semantic analysis, context determination, dynamic database selection etc. Informational, navigational, and transactional queries.
Web Search - evolution
Find information assumed to be available on the web in a static form.
Seeking information modes:
Informational
Reach a particular site that the user has in mind, either because they visited it in the past or because they assume that such a site exists. Have usually only one "right" result.
Seeking information modes:
Navigational
Reach a site where further interaction will happen. This interaction constitutes the transaction de!ning these queries. The main categories for such queries are shopping, !nding various web-mediated services, downloading various type of !le (images, songs, etc), accessing certain data-bases (e.g. Yellow Pages type data), !nding servers (e.g.for gaming) etc.
Seeking information modes:
Transactional
Finding something when I know what I want and have words to describe it.
Four modes of seeking information
Exploring when I only have some idea of what I want and may lack the words to articulate it.
Four modes of seeking information
Finding relevant items when I don’t know what I need.
Four modes of seeking information
Finding something I have seen before, but can’t remember where.
Four modes of seeking information
•Amount of information is growing everyday
•What to Search for?
•Where to Search?
•How to Search?
•Search is simple, complex and powerful
•Findability Dimensions
The State of Enterprise Search
STATS FROM THE
“ENTERPRISE SEARCH AND FINDABILITY SURVEY 2012”
SIGN-UP
HOW CRITICAL IS FINDING THE RIGHT INFORMATION TO BUSINESS GOALS AND
SUCCESS?
EUROPE76.5%
IMPERATIVE/SIGNIFICANT
Zoom Zoom
IS IT EASY TO FIND THE RIGHT INFORMATION
WITHIN YOUR ORGANISATION TODAY?
EUROPE77%
MODERATELY/VERY HARD
LEVEL OF SATISFACTION?
EUROPE18.5%
MOSTLY/VERY SATISFIED
WHAT ARE THE OBSTACLES TO FINDING THE RIGHT
INFORMATION?
63.4% POOR SEARCH FUNCTIONALITY
52.1% DON'T KNOW WHERE TO LOOK
51.4% INCONSISTENCY IN HOW WE TAG
CONTENT
50.0% LACK OF ADEQUATE TAGS
33.1% DON’T KNOW WHAT TO LOOK FOR
Globally
“Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a de!ned audience.”http://en.wikipedia.org/wiki/Enterprise_search
Wikipedia De!nition
In the !eld of information retrieval, precision is the fraction of retrieved documents that are relevant to the search.
Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-o" rank, considering only the topmost results returned by the system. This measure is called precision at n or P@n.
Source: Wikipedia
The Concept of Enterprise Search: Precision
Recall in information retrieval is the fraction of the documents that are relevant to the query that are successfully retrieved.
For example for text search on a set of documents recall is the number of correct results divided by the number of results that should have been returned.
Source: Wikipedia
The Concept of Enterprise Search: Recall
M number of relevant documents
N number of retrieved documents
R number of retrieved documentsthat are also relevant
Precision and Recall
Recall = R / M =
Number of retrieved documents that are also relevant / Total number of relevant documents.
Precision = R / N =
Number of retrieved documents that are also relevant / Total number of retrieved documents.
Precision and Recall
...enterprises typically have to use other query-independent factors, such as a document's recency or popularity, along with query-dependent factors traditionally associated with information retrieval algorithms. Also, the rich functionality of enterprise search UIs, such as clustering and faceting, diminish reliance on ranking as the means to direct the user's attention.
Relevance
Source: Wikipedia
PageRank
We do not have PageRank...
...but we have social!
Social Reconnects Enterprise Search
Emails, People Catalogues, Connections, Tagging, Sharing etc.
Relevance
The Concept of Enterprise Search
Examples of implementations:
- People Search
- Product Search
- Document Search
- Intranet and Website Search
- E-commerce
- Dashboard / Search as a Service
Search based Solutions
• Good Data/Information hygiene
• Crap in = Crap out
• Metadata is very important!
• Taxonomy and Metadata demysti!ed
• TetraPak example (video)
• SimCorp example
• VGR example (video)
Information / Content
•List
yeraze
svenwerk
HCE (SWEDEN)DEWEY DECIMAL CLASSIFICATION
Author: Douglas CouplandTitle: Hej Nostradamus!Publisher: Norstedts
Printed by: SmedjebackenYear: 2003
Printed: 2004
KristianNorling
Metadata
Semantic
KristianNorling
Example: Ernst & Young
• Metadata
• Titles
• Content Quality
• Information Life Cycle Management
ESEO: Actionable activities
But, an average Search budget is 100K Euro
• TCO
• ROI
• KPI
Search Analytics is key
Show me the Money
Important, delivers actionable to-dos quickly
• 0-results
• Top Terms Searched for
Video: Search Analytics in Practice
Search Analytics
• Feedback form
• KPI from Search Analytics
• Session time x n:o sessions = Time spent on search x hourly price = Cost per “answer”
• Add search re!nements + exit page (=is the right answer)
User Satisfaction
Findability by Findwise
1. BUSINESS
Build solutions to support your business processes and goals
2. INFORMATION
Prepare information to make it !ndable
3. USERS
Build usable solutions based on user needs
4. ORGANISATION
Govern and improve your solution over time
5. SEARCH TECHNOLOGY
Build solutions based on state-of-the-art search technology
• Analyze how your business goals and strategies can be met by improved information access
• Set Findability goals. Examples; increase the revenue on sales, raise productivity, improve knowledge sharing, better collaboration
• Specify your requirements
• De!ne KPI’s and measure the success of your investments
Business
• Clean up and archive or delete outdated/unrelevant information
• Ensure good quality of information by adding structured and suitable metadata
• Create and use information models and taxonomies
• Tagging?
Information
• Get to know your users and their needs
• Make sure your solution is easy to use
• Perform continuous usability evaluations, like usage tests and expert evaluations
• Make sure users !nd what they are looking for
• Enable feedback loops for complaints, feedback and praise
Users
• Resources!
• De!ne processes, roles and routines to govern the solution
• Perform Search Analytics
• Create easy to use administration interfaces
• Perform training, technical and editorial
• Help publishers get started with processes for better !ndability
Organisation
• Select a suitable search platform or make the most of your current solution• Design your architecture with search-as-a-service in mind• Utilise the full potential of the selected technology
Search Technology
Kristian Norling
@kristiannorling
@!ndwise
!ndwise.com
Findability Blog
Slideshare
Vimeo
Newsroom
Kristian Norling