introduction to enterprise search

34
SURE Internal Training

Upload: usama-nada

Post on 16-Jul-2015

285 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Introduction to enterprise search

SUREInternal Training

Page 2: Introduction to enterprise search

SUREInternal Training

Work at SURE Technology &

Consulting

Technical Team Leader, and Enterprise Search

Consultant.

In love with ASP.Net, SharePoint, ALM, and Software Architecture, involved in search technology, and search solutions since 2007

Has working Experience with

Profile: http://www.linkedin.com/in/usamanada

Twitter: https://twitter.com/usama_nada

Page 3: Introduction to enterprise search

Lets Start

Know What is Enterprise Search.

How Search Works

The business of Search

Page 4: Introduction to enterprise search

SUREInternal Training

General overview

Page 5: Introduction to enterprise search

Problems it came to solve

Different repositories

Data in many formats.

Very Large Volumes.

Security Concerns

Bad Relevancy offered by databases solutions

High Query Rate per second killing your Database

….

Page 6: Introduction to enterprise search

What is Enterprise Search?

It helps you find your stuff…

Page 7: Introduction to enterprise search

Give me better definition…

Page 8: Introduction to enterprise search

Search Based Application

A software application in which a search engine platform is used as the core

infrastructure for information access and reporting.

Whose main purpose is performing a domain-oriented task.

Page 9: Introduction to enterprise search

Search Engine

Effectiveness (quality of results)

As good as possible

Efficiency (response time and throughput)

As quickly as possible

Page 10: Introduction to enterprise search

SUREInternal Training

high level overview of the search concepts and architecture

Page 11: Introduction to enterprise search

How Search Works Getting The Data

Crawlers

Web Crawler

Focused Crawler

Connectors

Database

ECM

CRM

Exchange

Files

Page 12: Introduction to enterprise search

How Search Works Process The Data (Indexing)

Page 13: Introduction to enterprise search

How Search Works

Document Words

Document 1the,cow,says,moo

Document 2the,cat,and,the,hat

Document 3

the,dish,ran,away,with,t

he,spoon

Forward Index

Page 14: Introduction to enterprise search

How Search Works Search The Data

Page 15: Introduction to enterprise search

How Search Works Summary

Page 16: Introduction to enterprise search

SUREInternal Training

Page 17: Introduction to enterprise search

Selected Features Architecture

Distributed Computing capabilities

Support building High scalable, high performance, and fault tolerant clusters

Index Replication, load balancing

Near Real-Time Indexing

….

For Developers and System Integrators

API Access for Indexing and Searching

Ability to build custom connectors

Advanced configurable Language Analysis

Relevancy and ranking is configurable

….

Page 18: Introduction to enterprise search

Selected Features Faceted Search and Filtering

Page 19: Introduction to enterprise search

Selected Features Multimedia Search Filter by Images Attributes

Page 20: Introduction to enterprise search

Selected Features

Advanced Text Analysis.

Language detection + Tokenization + Normalization

Arabic (all NLP features: Morphology, Normalization, translation, named entity,, synonyms, and more …)

Farsi (Persian), Urdu, Pachtoun, Cyrillic, Chineese/Japanese/Korean …. And

others

Page 21: Introduction to enterprise search

Selected Features Entity Extraction Enables “Discovery”

Languages:

Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Korean, Pashto,

Persian and(Farsi, Dari), Portuguese, Russian, Spanish, Urdu, …

Page 22: Introduction to enterprise search

Selected Features Synonyms

DB administrator

is defined as synonym of

Database Administrator

This synonymy

can be in one direction or

both ways

Page 23: Introduction to enterprise search

Selected Features

Name Indexing (cross-language “People Search”).

Page 24: Introduction to enterprise search

Selected Features Multilingual Search (Cross Language Information Retrieval)

Afghanistan

Page 25: Introduction to enterprise search

Selected Features Taxonomy (Categorizer): Predict category of a new document using an existing training

dataset (for example: dmoz)

Business

Consumer

Services

InqueriesCustomer

Service

Shopping

Pets

Page 26: Introduction to enterprise search

Selected Features Geospatial Search

• Limiting the search queries to geographic area

• Users can draw polygon and circle shapes to refine search results to desired areas

• Multiple Areas can be selected for single query

Page 27: Introduction to enterprise search

Selected Features

Enterprise Search as a NoSQL Database

NoSQL Data Store:

Non-traditional data stores. Not built around SQL, Distributed,

Fault Tolerant Architecture. Built to provide High Performance

Page 28: Introduction to enterprise search

Selected Features Enterprise Search as a BI platform

Page 29: Introduction to enterprise search

Other Features Spell checking

Query suggestion

Autosuggest

Search Alerts

Document Thumbnails

Sentiment analysis

Targeted Ads, and document boosting.

Recommendations. “More Like This”

Translate, visualization, …

Page 30: Introduction to enterprise search

SUREInternal Training

Page 31: Introduction to enterprise search

Search Market Market Size: In 2012 The total annual sales of search software may only amount to $3billion at most and there are

probably no more than 80 companies in the business at present

Vendors: Exalead, Google, Oracle, Attivio, HP, ….

System Integrators: There are now a number of systems integration companies that specialize in search

implementation projects, offering a range of services

Open Source Search: Getting Much Stronger since SOLR appearance in 2006 with different business models

Appliances: Started with Google and Autonomy and now to SOLR

Cloud: cloud-based search-as-a-service applications lead by Amazon, and windows Azure.

Specialized Search Components: NLP Components, and Document Filters

Page 32: Introduction to enterprise search

Selected Market Players

• Lexmark - Isys-Search

Page 33: Introduction to enterprise search

ReferencesWikipedia : Web Crawler, Search engine indexing, TF-IDF, Cosine Similarity, Vector

Space Model

Gartner: Gartner Magic Quadrant for Enterprise Search

Articles: “NoSQL, Lucene, and Solr”, TF-IDF for Dummies, TF-IDF and cosine similarity

Blogs: Exalead Blog, Attivio Blog, Enterprise Search Blog, LucidWorks Blog

Books: Enterprise Search (O’Reilly, 2012), An Introduction to Information Retrieval (Cambridge UP, 2009)

Slides: Exploring search driven applications with SharePoint 2013

Academic: Information Retrieval Course(Conrel University)

Information Retrieval and Web Search (SFU)

Search Engine Architecture (HPI)

Page 34: Introduction to enterprise search

SUREInternal Training

Thank You