share point2007 best practices final

49
Enterprise Search ITP278

Upload: marianne-sweeny

Post on 14-Jan-2015

4.374 views

Category:

Technology


1 download

DESCRIPTION

Delivered at the SharePoint Best Practices Conference in La Jolla CA Feb 6 to 9

TRANSCRIPT

Page 1: Share Point2007 Best Practices Final

Enterprise Search

ITP278

Marianne Sweeny Ascentium wwwascentiumcom Mariannesweenyascentiumcom Director of Search Services Web producer

at Microsoft for 7+ years pointy-head not propeller-head

Agenda Introduction MOSS 2007 Search Configuring MOSS Search Here There Be Dragons Resources Appendix

Introduction

July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web

2000 1 billion pages1999 26 million pages

There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008

ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year

There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first

few pages and not necessarily Google itselfndash Enterprises have different lines of business and different

information types Search of tomorrow is here today

ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable

Search Index A Different Kind of Database

Search Engine Index SQL Server Index

Web Search and Enterprise Search

Publishers want their content to be found

Anarchistic publishing model = ldquoanyone anywhere any timerdquo

Unlimited document set No real standards or code more like

guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone

worldwide No shared understanding

Enterprise Search

Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008

Web Search Publishers do not think about

document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally

share contextual understanding Customized tagging or metadata Can customize search

technology to enterprise themes and concepts

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 2: Share Point2007 Best Practices Final

Marianne Sweeny Ascentium wwwascentiumcom Mariannesweenyascentiumcom Director of Search Services Web producer

at Microsoft for 7+ years pointy-head not propeller-head

Agenda Introduction MOSS 2007 Search Configuring MOSS Search Here There Be Dragons Resources Appendix

Introduction

July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web

2000 1 billion pages1999 26 million pages

There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008

ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year

There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first

few pages and not necessarily Google itselfndash Enterprises have different lines of business and different

information types Search of tomorrow is here today

ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable

Search Index A Different Kind of Database

Search Engine Index SQL Server Index

Web Search and Enterprise Search

Publishers want their content to be found

Anarchistic publishing model = ldquoanyone anywhere any timerdquo

Unlimited document set No real standards or code more like

guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone

worldwide No shared understanding

Enterprise Search

Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008

Web Search Publishers do not think about

document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally

share contextual understanding Customized tagging or metadata Can customize search

technology to enterprise themes and concepts

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 3: Share Point2007 Best Practices Final

Agenda Introduction MOSS 2007 Search Configuring MOSS Search Here There Be Dragons Resources Appendix

Introduction

July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web

2000 1 billion pages1999 26 million pages

There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008

ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year

There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first

few pages and not necessarily Google itselfndash Enterprises have different lines of business and different

information types Search of tomorrow is here today

ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable

Search Index A Different Kind of Database

Search Engine Index SQL Server Index

Web Search and Enterprise Search

Publishers want their content to be found

Anarchistic publishing model = ldquoanyone anywhere any timerdquo

Unlimited document set No real standards or code more like

guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone

worldwide No shared understanding

Enterprise Search

Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008

Web Search Publishers do not think about

document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally

share contextual understanding Customized tagging or metadata Can customize search

technology to enterprise themes and concepts

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 4: Share Point2007 Best Practices Final

Introduction

July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web

2000 1 billion pages1999 26 million pages

There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008

ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year

There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first

few pages and not necessarily Google itselfndash Enterprises have different lines of business and different

information types Search of tomorrow is here today

ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable

Search Index A Different Kind of Database

Search Engine Index SQL Server Index

Web Search and Enterprise Search

Publishers want their content to be found

Anarchistic publishing model = ldquoanyone anywhere any timerdquo

Unlimited document set No real standards or code more like

guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone

worldwide No shared understanding

Enterprise Search

Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008

Web Search Publishers do not think about

document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally

share contextual understanding Customized tagging or metadata Can customize search

technology to enterprise themes and concepts

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 5: Share Point2007 Best Practices Final

There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008

ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year

There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first

few pages and not necessarily Google itselfndash Enterprises have different lines of business and different

information types Search of tomorrow is here today

ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable

Search Index A Different Kind of Database

Search Engine Index SQL Server Index

Web Search and Enterprise Search

Publishers want their content to be found

Anarchistic publishing model = ldquoanyone anywhere any timerdquo

Unlimited document set No real standards or code more like

guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone

worldwide No shared understanding

Enterprise Search

Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008

Web Search Publishers do not think about

document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally

share contextual understanding Customized tagging or metadata Can customize search

technology to enterprise themes and concepts

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 6: Share Point2007 Best Practices Final

Search Index A Different Kind of Database

Search Engine Index SQL Server Index

Web Search and Enterprise Search

Publishers want their content to be found

Anarchistic publishing model = ldquoanyone anywhere any timerdquo

Unlimited document set No real standards or code more like

guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone

worldwide No shared understanding

Enterprise Search

Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008

Web Search Publishers do not think about

document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally

share contextual understanding Customized tagging or metadata Can customize search

technology to enterprise themes and concepts

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 7: Share Point2007 Best Practices Final

Web Search and Enterprise Search

Publishers want their content to be found

Anarchistic publishing model = ldquoanyone anywhere any timerdquo

Unlimited document set No real standards or code more like

guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone

worldwide No shared understanding

Enterprise Search

Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008

Web Search Publishers do not think about

document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally

share contextual understanding Customized tagging or metadata Can customize search

technology to enterprise themes and concepts

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 8: Share Point2007 Best Practices Final

Advanced Search Few customers use it and those that do are

disappointed Boolean or SQL operators work sporadically

Confusing message What is ldquoregularrdquo searchhellipnot as effective

Search has progressed beyond the stages of Advanced Filters Facets Context

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 9: Share Point2007 Best Practices Final

MOSS 2007 Search

Query engine breaks the search terms down

Index engine stores the properties

Content index stores the text

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 10: Share Point2007 Best Practices Final

Better Than EverMOSS 2007 Relevance customizable to the

enterprise content Automated metadata extraction Enhanced text analysis

Fully integrated admin experience between Windows

SharePoint Services v3 and MOSS 2007 Single search system and index

per server farm Custom content groups Best

Bets scheduling are now shared services

Scopes can be tied to document properties

Improved control over indexing

SharePoint 2003 Relevance keyed on numeric values

derived solely from document text Collection frequency Term frequency Document length Term position

Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best

Bets scheduling configurations are portal-based

Scopes tied to content sources Index propagated at completion of

master crawl only

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 11: Share Point2007 Best Practices Final

Simplified Administration UISearch settings page at the SSP levelManaging crawls

bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)

Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the

content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)

Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules

Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 12: Share Point2007 Best Practices Final

Indexing Performance Improvements Search is a shared service

ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered

centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites

Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo

Content index that holds text of pages with Property store that holds other document values

Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties

Single item add removal without re-indexing entire corpus with continuous propagation

ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items

ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 13: Share Point2007 Best Practices Final

Relevance Types Dynamic ranking = relevance impacted by query term

ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL

Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails

XML files Excel spreadsheets Plain text List items

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 14: Share Point2007 Best Practices Final

Relevance EnhancementsManually assign synonyms and editorialized results to keywords

ndash Use search logs to detect popular searches low click-through from results or 0 result queries

Search Alertsndash User can subscribe to receive email when results change

File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)

than others (XML txt)ndash Supports 220 files types MS and non-MS application

Property weights ndash Assign different weights to properties so that important

properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object

Modelndash Note The weights used in the product were carefully tested

Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial

undertaking

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 15: Share Point2007 Best Practices Final

MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results

bullCan be presented pre- or post- querybullUsed for Advanced search

Empowers customer to most effectively refine their search

Filters results by predetermined categories

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 16: Share Point2007 Best Practices Final

Federated Search Import or export federated locations using Federated

Location Definition (FLD) files Incorporates results from outside content sources that

subscribe to OpenSearch 11 Passes the query into the subscribed resource and

returns results into single interface Relevance calculation done according to originating

resource criteria not MOSS 2007 criteria Pre-defined FLD files found at

httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp

Can develop own FLD files if destination subscribes to OpenSearch 11

ndash Day Software has developed a standard connector for LiveLink ECM

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 17: Share Point2007 Best Practices Final

People SearchBuild and publish rich personal profiles

Customize personal profile attributes Populate personal profiles using information from Active Directory other

LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory

information Publish personal profiles using MOSS My Sites

Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships

distribution lists and colleagues Group results by social distance Subscribe to People Alerts

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 18: Share Point2007 Best Practices Final

People Search Results Page

Find people by project expertise orhellip

Find people by project expertise orhellip

Filter by relevant attributes

Filter by relevant attributes

Contact information amp online availabilityContact information amp online availability

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 19: Share Point2007 Best Practices Final

Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search

service Searches any data source

accessible through ADOnet or Web Services

Uses Live Communication Server for connectivity options

Aggregated into a single application

LOB Applications with BDC

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 20: Share Point2007 Best Practices Final

FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help

desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-

external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing

Why is it Unique Auto Classification Advanced Linguistics text mining for

concept and relationship mapping Recall Lemmatization synonym

expansion wildcards anti-phrasing phonetic search

Precision Exact word matching exact phrase matching proximity tokenization

Location aware results (retail and news) ndash excellent for mobile search

Recommendation engine Increased capacity100-200 million

documents on 1 server and 150 million qsecond

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 21: Share Point2007 Best Practices Final

Custom Results Search Scopes

Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results

Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other

relevant results Less favoritism more results on desired page 1

Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as

matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of

definition) Returned in the Query Object Model Can not be edited

Best Bets Editorially assigned results based on these key concepts assigned to selected

query terms Can be many-to-many

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 22: Share Point2007 Best Practices Final

Scalability No physical limit for the maximum number of

documents in one index Recommended document limit is 50 Millions of

documents per indexer A document is anything from a Word or PowerPoint

file to a web page an individual SharePoint list item one people entry or an SAP customer record

Largesmall documents count the same The lsquoaverage document sizersquo depends on the

corpus mixndash ie heavy use of WSS 30 lists versus limited use

Dependent on supporting hardware

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 23: Share Point2007 Best Practices Final

Security Query time stripping ndash customer only sees those results

that they have permission to view Support for pluggable authentication for content in

SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model

Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites

Search visibility options Prevent siteslists appearing in search results at a

sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 24: Share Point2007 Best Practices Final

Search Analytics Export search logs to Excel

Query terms Page views Number of results returned

Volume trends Query success can define success for

certain query terms Report Center

Access to MOSS 2007 BI features Filters data for permissions and relevance

Key Performance Indicators [KPI] Create a KPI list or other measures of

success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS

2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 25: Share Point2007 Best Practices Final

Configuring MOSS 2007 Search

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 26: Share Point2007 Best Practices Final

Search Roadmap Useful participants

Content creators Information ArchitectUser Experience Architect Taxonomist

Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes

Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using

Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the

enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes

and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 27: Share Point2007 Best Practices Final

Paretorsquos Principle Known as the 8020 rule

Named after late 19th century economist

20 of your content is answering 80 of your searches

Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 28: Share Point2007 Best Practices Final

Define Content Define content scopes

Segment content into logical groups Create scope rule based on

ndash Addressndash Property queryndash Content source

At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP

Select Authority resources Define special terms if needed

Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo

Provides additional clarification for searcher Use synonym mapping for term variants

ndash C and Csharp

Two information points can be displayed for a special termndash Definition of the termndash Best Bet

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 29: Share Point2007 Best Practices Final

Designate Authority Sites Hilltop Algorithm

Quality of links more important than quantity of links

Segmentation of corpus into broad topics

Selection of authority sources within these topic areas

Pre-query calculation applied at query time

Topic Sensitive Page Rank Consolidation of Hypertext Induced

Topic Selection [HITS] and PageRank Pre-query calculation of factors

based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting

query

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 30: Share Point2007 Best Practices Final

Educate Structural Influences File Type Bias

In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items

Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language

URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed

in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the

URL

Keywords separated by hyphens in the URL are good

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 31: Share Point2007 Best Practices Final

Educate Content Influences Anchor Link Text

Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks

Any file types handled by installed 3rd party iFilter components which emit hyperlinks

Metadata extraction Shadow title detection is provided within the body of the item

ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types

Auto Description text Optimized URLs

Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as

the first result

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 32: Share Point2007 Best Practices Final

Enhanced Search Results

Synonym Mapping Best Bets

Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 33: Share Point2007 Best Practices Final

Hardware Considerations Dedicated crawl-target servers for large

sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer

more memory Dedicated Web Front End Server for

crawling Separate indexer machine

In most cases your search index is on its own server

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 34: Share Point2007 Best Practices Final

Indexing Configuration Use dedicated web front ends for crawling large

farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index

them faster Define Crawler Impact Rules to avoid site overload

Schedule for off-hours crawling where appropriate Balance results freshness with load on servers

Consider using single content access account per region

Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 35: Share Point2007 Best Practices Final

Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part

1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx

2 Click the Site Actions link and then click Edit Page

3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane

4 Click Data Form Web Part to display the XSL Editornode

5 Click the Source Editor button

6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005

7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 36: Share Point2007 Best Practices Final

Here There Be Dragons

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 37: Share Point2007 Best Practices Final

Dragons 1 Note the infrastructure update where Microsoft rolled

the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here

httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx

Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not

reading the documentation and installing the prerequisite patches

Must ensure a schedule for the incremental crawl to catch additions to the document set

Must turn on PDF indexer and stemming

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 38: Share Point2007 Best Practices Final

Dragons 2 Use the Web part to accommodates wildcard

search Found here

httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx

Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities

The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality

Benefits of click-distance are missed if Authority sites are not configured

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 39: Share Point2007 Best Practices Final

Dragons 3 The value of statistical ranking can vary from the partial

indexes to the master merge index Without authoritative sites configured in the relevance

settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Index files scopes search alerts filters word breakers thesaurus files not upgraded

Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 40: Share Point2007 Best Practices Final

Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007

httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US

Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc

Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml

MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx

MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx

MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx

Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx

Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 41: Share Point2007 Best Practices Final

More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search

httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx

Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx

Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml

Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf

Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies

Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14

SEO Advice from a Propellerhead for hellip httpwwwmossseocom

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 42: Share Point2007 Best Practices Final

Even More Resources MOSS 2007 Administrator Documentation

httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3

SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links

All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx

Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx

MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx

MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx

Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 43: Share Point2007 Best Practices Final

Appendix

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 44: Share Point2007 Best Practices Final

Auto Classification Products Concept Searching

Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish

multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher

queryndash Presents for search refinement

httpwwwconceptsearchingcomconceptHMSO (insider trading)

Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 45: Share Point2007 Best Practices Final

Adjusting Relevance Property weights

Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking

Change default property weights through the Schema Object Model

using MicrosoftOfficeServerSearchAdministration())

Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()

SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)

Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 46: Share Point2007 Best Practices Final

PushPull Data to Users Alerts

Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications

Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time

lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part

A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx

Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for

WSS alert types

RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the

Search Action Links web part and on the Search Core Results web part

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 47: Share Point2007 Best Practices Final

Protocol Handlers

Connects to a content source and enumerates the documents

Ships with support for Web Content NTFS File Shares Exchange

Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog

Partners providing support for Documentum Hummingbird OpenText

FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd

khtml_introduction_to_a_protocol_handleraspframe=true

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 48: Share Point2007 Best Practices Final

The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping
Page 49: Share Point2007 Best Practices Final

Metadata Property Mapping

Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property

ID (name or numeric ID) Managed properties

Mapping target for crawled properties (many-to-many)

Identified by internal ID Friendly name used in queries

ndash Can be used in the query with property Value

  • Enterprise Search
  • Slide 2
  • Agenda
  • Introduction
  • There is No Magic Bullet
  • Search Index A Different Kind of Database
  • Web Search and Enterprise Search
  • Advanced Search
  • MOSS 2007 Search
  • Better Than Ever
  • Simplified Administration UI
  • Indexing Performance Improvements
  • Relevance Types
  • Relevance Enhancements
  • MOSS 2007 Faceted Search
  • Federated Search
  • People Search
  • People Search Results Page
  • Slide 19
  • FAST ESP Technology
  • Custom Results
  • Scalability
  • Security
  • Search Analytics
  • Configuring MOSS 2007 Search
  • Search Roadmap
  • Paretorsquos Principle
  • Define Content
  • Designate Authority Sites
  • Educate Structural Influences
  • Educate Content Influences
  • Enhanced Search Results
  • Hardware Considerations
  • Indexing Configuration
  • Customizing Results Display
  • Here There Be Dragons
  • Dragons 1
  • Dragons 2
  • Dragons 3
  • Resources
  • More Resources
  • Even More Resources
  • Appendix
  • Auto Classification Products
  • Adjusting Relevance Property weights
  • PushPull Data to Users
  • Protocol Handlers
  • The Query object model
  • Metadata Property Mapping