introduction to enterprise search

Post on 16-Jul-2015

286 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SUREInternal Training

SUREInternal Training

Work at SURE Technology &

Consulting

Technical Team Leader, and Enterprise Search

Consultant.

In love with ASP.Net, SharePoint, ALM, and Software Architecture, involved in search technology, and search solutions since 2007

Has working Experience with

Profile: http://www.linkedin.com/in/usamanada

Twitter: https://twitter.com/usama_nada

Lets Start

Know What is Enterprise Search.

How Search Works

The business of Search

SUREInternal Training

General overview

Problems it came to solve

Different repositories

Data in many formats.

Very Large Volumes.

Security Concerns

Bad Relevancy offered by databases solutions

High Query Rate per second killing your Database

….

What is Enterprise Search?

It helps you find your stuff…

Give me better definition…

Search Based Application

A software application in which a search engine platform is used as the core

infrastructure for information access and reporting.

Whose main purpose is performing a domain-oriented task.

Search Engine

Effectiveness (quality of results)

As good as possible

Efficiency (response time and throughput)

As quickly as possible

SUREInternal Training

high level overview of the search concepts and architecture

How Search Works Getting The Data

Crawlers

Web Crawler

Focused Crawler

Connectors

Database

ECM

CRM

Exchange

Files

How Search Works Process The Data (Indexing)

How Search Works

Document Words

Document 1the,cow,says,moo

Document 2the,cat,and,the,hat

Document 3

the,dish,ran,away,with,t

he,spoon

Forward Index

How Search Works Search The Data

How Search Works Summary

SUREInternal Training

Selected Features Architecture

Distributed Computing capabilities

Support building High scalable, high performance, and fault tolerant clusters

Index Replication, load balancing

Near Real-Time Indexing

….

For Developers and System Integrators

API Access for Indexing and Searching

Ability to build custom connectors

Advanced configurable Language Analysis

Relevancy and ranking is configurable

….

Selected Features Faceted Search and Filtering

Selected Features Multimedia Search Filter by Images Attributes

Selected Features

Advanced Text Analysis.

Language detection + Tokenization + Normalization

Arabic (all NLP features: Morphology, Normalization, translation, named entity,, synonyms, and more …)

Farsi (Persian), Urdu, Pachtoun, Cyrillic, Chineese/Japanese/Korean …. And

others

Selected Features Entity Extraction Enables “Discovery”

Languages:

Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Korean, Pashto,

Persian and(Farsi, Dari), Portuguese, Russian, Spanish, Urdu, …

Selected Features Synonyms

DB administrator

is defined as synonym of

Database Administrator

This synonymy

can be in one direction or

both ways

Selected Features

Name Indexing (cross-language “People Search”).

Selected Features Multilingual Search (Cross Language Information Retrieval)

Afghanistan

Selected Features Taxonomy (Categorizer): Predict category of a new document using an existing training

dataset (for example: dmoz)

Business

Consumer

Services

InqueriesCustomer

Service

Shopping

Pets

Selected Features Geospatial Search

• Limiting the search queries to geographic area

• Users can draw polygon and circle shapes to refine search results to desired areas

• Multiple Areas can be selected for single query

Selected Features

Enterprise Search as a NoSQL Database

NoSQL Data Store:

Non-traditional data stores. Not built around SQL, Distributed,

Fault Tolerant Architecture. Built to provide High Performance

Selected Features Enterprise Search as a BI platform

Other Features Spell checking

Query suggestion

Autosuggest

Search Alerts

Document Thumbnails

Sentiment analysis

Targeted Ads, and document boosting.

Recommendations. “More Like This”

Translate, visualization, …

SUREInternal Training

Search Market Market Size: In 2012 The total annual sales of search software may only amount to $3billion at most and there are

probably no more than 80 companies in the business at present

Vendors: Exalead, Google, Oracle, Attivio, HP, ….

System Integrators: There are now a number of systems integration companies that specialize in search

implementation projects, offering a range of services

Open Source Search: Getting Much Stronger since SOLR appearance in 2006 with different business models

Appliances: Started with Google and Autonomy and now to SOLR

Cloud: cloud-based search-as-a-service applications lead by Amazon, and windows Azure.

Specialized Search Components: NLP Components, and Document Filters

Selected Market Players

• Lexmark - Isys-Search

ReferencesWikipedia : Web Crawler, Search engine indexing, TF-IDF, Cosine Similarity, Vector

Space Model

Gartner: Gartner Magic Quadrant for Enterprise Search

Articles: “NoSQL, Lucene, and Solr”, TF-IDF for Dummies, TF-IDF and cosine similarity

Blogs: Exalead Blog, Attivio Blog, Enterprise Search Blog, LucidWorks Blog

Books: Enterprise Search (O’Reilly, 2012), An Introduction to Information Retrieval (Cambridge UP, 2009)

Slides: Exploring search driven applications with SharePoint 2013

Academic: Information Retrieval Course(Conrel University)

Information Retrieval and Web Search (SFU)

Search Engine Architecture (HPI)

SUREInternal Training

Thank You

top related