spcua 2013 alexey kozhemiakin enterprise search

Post on 28-Jan-2015

104 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

English version of my slides from SPCUA 2013

TRANSCRIPT

May 22nd 2013, Kiev

Enterprise search portals SharePoint 2013

Alexey Kozhemiakin

May 22nd 2013, Kiev

or “How to make a cool search”

Alexey Kozhemiakin

3

Who’s speaking to you?

• Solution Architect @epam

• Focusing on search• Sharepoint Search FAST/2010/2013• Apache Lucene, Solr, elasticsearch,

Oracle Endeca…

• http://powersearching.wordpress.com

4

Agenda

• Enterprise Search Portal• Insight into SP2013 Search• Key changes from SP2010• A bit of magic – relevancy calculation

• Search governance, useful hint & tips

5

Key search patterns

• I know what I’m searching and where to find it

• I know what I’m searching but don’t know where to find it.

• I don’t‘ know what I’m searching

http://aghy.hu/AghyBlog_EN/Lists/Posts/Post.aspx?ID=199

6

• Demand:• Fast growing enterprises• Zoo of internal systems

• Solution: • “google” inside enterprise

• Quick-wins for business:• Single point of smart search and information retrieval• Reduce search time by employee• Better inner communications and simplified reuse of

conent

Enterprise Search Portal

7

But after deployment…

• «.. Search sucks»• Out of the box search knows nothing about you• «Typical But…• … Microsoft takes care of decent search algorithm»• … we’re not sure we can do better»• ... we don’t need search, everybody know where content is»• … make our search like in facebook/google/bing (instead of

requirements)»

8

Why it’s hard

• Ambiguous short queries• Unstructured not optimized content• Different active vocabulary of content users and

creators• Limited resources ($), while in internet search:• Auto and manual testing of search quality (assessors)• Continuous improvement

9

Search architecture in SP2013

10

Search in two phase process

• Matching – all docs with keywords• Linguistics: stemming, phonetics• Synonyms

• Ranking• «Фичи»

• TF-IDF, BM25• Вес полей• Тип файла• Дата изменения• Популярность• …

11

Ranking in FAST

• Linear combination of features

12

Ranking in FAST

• Impact of each component to final rank

1st 2nd 3rd 4th0

1000

2000

3000

4000

5000

6000

7000

8000

term:fast term:search freshness static rank proximity

13

Migration FAST->SP2013

14

Ranking in SP2013

15

Ranking in SP2013

• Default Relevancy Model• Two neural networks• Freshness in not included in ranking• Features Type Instance

BM25 BM25Static UrlDepthBucketedStatic InternalFileTypeBucketedStatic LanguageStatic ClickDistanceStatic QueryLogClicksStatic QueryLogSkipsStatic LastClicksStatic EventRateMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft Content

16

Ranking in SP2013

• Default relevancy model

17

Explain rank

• /_layout/15/explainrank.aspx• rankdetail property

18

Explain rank

• Manual validation in excel

19

20

Search Governance

1. Search analytics2. Fine tuning and adaptation3. Regular testing4. Security assessment5. Promotion whithin company6. Content optimization and basic SEO

21

1. Search analytics

• Search analytics• Search analytics• Search analytics

• Obey! Use Search analytics

22

1. Search analytics

• OOTB in SP2013• Most popular queries• «No Results/abandoned» queries

• 3rd party tools (Google Analytics, Omniture, WebTrends)• Measure search quality (!)

• % click on results• Which results• Return after clicks

• Session analysis• Query segmantation

23

Query segmantation

• Analyze and improve not only top N queries, but classes of queries

24

2. Fine tuning

• Authoritative Pages• Quick win – content source priority

• Query Rules• Smart search for users

• Synonyms• Separate mapping file• Expansion only• Termsets synonyms NOT working

• Relevancy models

25

Authoritative Pages

• Impacts ClickDistance• ClickDistance, UrlDepth have hich impact on total

score (see explain rank)• Configures in CA, CSOM

26

Query Rules (Rule + Action)

• The tool to make search smarter• Interactive feedback to user queries• Post processing of queries• Leverage navigational queries• …

27

Condition for Query Rules

• Query Matches Keyword Exactly• Advanced Query Text Match• Query Matches Dictionary Exactly

• Query Contains Action Term

• Query More Common in Source• Result Type Commonly Clicked

28

Actions для Query Rules

• Create and display a result block• Change ranked search results• Best Bets• XRANK

• Works additive to total rank• Not explained in rankdetail• How to choose correct value?

29

Templates for QueryRules

• Typical navigational keywords from our portal• Software, soft, download, install• How to• Policy, Blog• Portal• Music, Video• Presentation, Documents, Report• Training, tutorial• Book, ebook

• You will have different ones!

30

Custom Rank Models

• Сбор Query Judgments• Tune neural network coefficients using machine

learning• Gradient Descent, Lambda Rank

• Microsoft.Office.Server.Search.RankerTuning

31

Custom Rank Models

• Modify manually new model or very simple (not default one!)• A/B testing of weights• Measure, measure: Precision, NDCG

32

Custom Rank Models

• Example of simple model – people search

33

3. Search quality testing

• Why need? It’s your compass.• «Unit testing»• Periodical manual testing

34

4. Security «audit»

• Search reveals breaches in security• Security by obscurity

• Examples of queries:• «confidential»• Salaries, performance reviews

• Solution – automatic monitoring of sensitive queries

35

5. Adoption of content

• Use with departments• Get help with search monitoring of their queries

• Guideline to format content• Basic SEO• Titles• Friendly urls • Custom meta tags <meta name=…

• Title, description• Custom Automatically appear in crawled properties

36

6. Promotion within company

• Image – «you will find everything here»• Integrate with other portals• Propose Search as a serivce• Widget «Global search»

• Badges, gamification

37

Promotion

• Social Best-bets

38

Semantic search

• Cannot be solved in general• Analytics + fine tuning• See practices above

• NLP – question answering• Rocket science• English only• Part of speech tagging, dependency parsing

• Stanford NLP, Open NLP, IR

39

«References»

• Patents - http://goo.gl/20sbR

• Explain Rank page - http://goo.gl/o3ZmN

• How SP2013 relevancy models works - http://goo.gl/arf0P

• MS Enterprise Search approach - http://goo.gl/x8SDO

• Customizing ranking models in SP 2013 - http://goo.gl/lBJAp

May 22nd 2013, Kiev

Thanks

Skype: Alexey_KozhemiakinEmail: Alexey.Kozhemiakin@gmail.comBlog: http://powersearching.wordpress.com

40

top related