carlos valcarcel: arrchitecture-fast search server 2010 for sharepoint

33
Architecture: Fast Search Server 2010 for SharePoint SharePoint Saturday Carlos Valcarcel Fast Technology Specialist, Fast, A Microsoft Subsidiary

Upload: sharepoint-saturday-ny

Post on 02-Dec-2014

3.503 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Architecture: Fast Search Server 2010 for SharePointSharePoint Saturday

Carlos ValcarcelFast Technology Specialist, Fast, A Microsoft Subsidiary

Page 2: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Demo: Fast Search Server 2010FAST: A Brief Time of HistorySharePoint 2010

Search features

Fast Search Server 2010FeaturesArchitecture

Why Fast Search Server instead of SharePoint search?

Agenda

Page 3: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

MSW – Microsoft Internal Web Site

demo

Page 4: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

You’ve probably heard it all before.Fast was founded in 1997; it was 11 when the acquisition completed (2008).AllTheWeb.com – still an active site!

Sold by Fast to Overture, then Overture bought by Yahoo!Fast invested in enterprise search

Our flagship product, ESP, powers some of the largest sites on the web

Dell, Best Buy, Scirus (Reed Elsevier), Financial Times, Oodle, Rakutan

When we OEM’ed our product:DocumentumDell Message One (Email/eDiscovery)CommVaultEMC CenteraMatterSpace®

Fast: A Brief Time of HistoryWhere did Fast come from?

Page 5: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Linear scalabilitySupport for more languagesBetter relevancySupport for 100 million documents per farmFederated results on one page (OpenSearch compliant)Navigators (navigator counts not displayed)Users can tag documentsSharePoint follows clicks to boost relevancyAuto detect languages in documentsUser can increase boosting based on languageQuery completionDid you mean…?Sub second response timeSynonym support (called Aliases)Phonetic matching (Sharten Mickleson Kjartan Mikkelsen)Native 64-bit deploymentScaling along all dimensions

Query processing across multiple servers

Search dashboardAdding contentCrawl rulesPowershell has 128 commandlets for search so everything you want to do for search can now be scripted.

Merges results from multiple nodes

SharePoint 2010 SearchA Brief Look: Great New Features! Less Filling! Secret Ingredients from Norway!

Page 6: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Almost everything available in SharePoint 2010Lemmatization/StemmingDocument Thumbnail and PreviewVisual Best BetsPeople Search with phonetic searchFederated Search (OpenSearch)Single search (federated) across all contentRelevancy per audience

Custom GUI per audience is possibleLocation, Language, Role, and Search awareDocument boosting and blocking (click-through relevancy)

Document processing pipelineSynonymsSecure SearchDynamic navigators (OOTB and custom)TaxonomyBreadcrumb navigation

Fast Search Server for 2010The Future of SharePoint Search: More and Better (did I mention with Secret Ingredients from Norway?)

Page 7: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

The GUI: Enhancing the Search ExperienceYou’ve Got Your Search in My Collaboration Platform!

FS4SP

Page 8: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

User Interface is visual and actionableVisual and conversational interaction with precise control

Built on SharePoint Search CenterLeverages all of innovations in SharePointOpen Web Parts, Federation, query suggestions, related queries, Did you mean?

Visual results connects users with contentThumbnails for Word and PowerPointVisual Best Bets highlight premium content Preview in browser without leaving the results

Deep Refinement

Thumbnails

Previews

Sort on any field

Similar Results

Page 9: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Map metadata to Managed Properties Automatic association of metadata to content

Crawled Properties Standard document metadata discovered by the crawler or extracted from the full text by the FAST Content Processing Pipeline.

Location

Redmond, WA

Oslo, Norway

Company

Microsoft

FAST

Date

January 8, 2008

January 4, 2008

Concept

Cash tender

Share price

Managed Properties Map one or more Crawled Properties to a single field. Enables sorting, refinement, relevance tuning and fielded searching.

Crawled Properties

Any data can be found!!Maps automatically or through Central Administration or PowerShell

Type DocId Title Author Date Size Location Company Concept Body

123 Press Release

… 01/08/2008

26K Redmond Microsoft Cash Tender

345 … … … … … … … …

Index Profile Managed Properties

Page 10: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Put your terms in the out of the box extraction dictionaries by modifying an XML fileMap the crawled property to a managed propertyIndex your contentModify refinement panel web part

How does it work?Example: Create a custom entity extractor

Customized Extraction Dictionary

Page 11: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

How does it work?

Built on a SharePoint List or custom extractorEdit the Search Center Results PageModify the shared web part by adding tags to the refinement panel XMLCreate your own labelsSave and Publish

Custom Collections

Add refiners to user interface

Page 12: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Quickly build a contextual experienceUser based tools for creating results that are relevant to your users

Pick the right ingredients Match the proper terms and contexts to boost relevancy for targeted users to ensure your users are always finding the right content

One-way synonymsKeywords map to other termsTwo-way synonymsKeywords become equivalent to other termsBest BetsHighlights key resources that are always relevant to a keywordVisual Best BetsExtend Best Bets with pictures, video, Silverlight controlsDocument Promotion / DemotionTailor specific document relevancy

Create new user contextsSite administrators create contexts based on user profiles to deliver relevant results to the right audiences

Create new keywordsSite Administrators have powerful and simple tools to configure the search experience for groups of users

Page 13: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Deliver results that are contextually relevantwith search that can understands your business and role

”What should I know about selling ERP?”- Alan Brewer, Sales

Lead

”What should I know about implementing

ERP?”- Renee Lo, Consultant

Role-specific relevance

Business drivenrefinement

Targeted Best Bets / Visual

Best Bets

Page 14: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Rank ProfilesTune relevancy without impacting the default algorithm

Quality Also known as static rank, consists of multiple managed properties including site, URL depth (preference for shorter URLs), and relative importance of links to this document.

Authority Applies when the query word falls in the link or anchor text.

Query Authority

Maps the popularity of a document, or the click-through rate when documents are clicked as a result of a query

Freshness Increases the relevancy if a document was recently created or modified, based on the last modified property.

Proximity Applies to where query terms fall and how close they are to each other within a document

Context Increases the rank of a document if the query term is a managed property associated with that document

Managed Property

Effects relevancy when a managed property contains a specific value, such as Woodgrove Bank or Financial Services

Out of the box relevancyTuned for great general productivity experience, relevancy improves with click-throughs and link text analysis.

Extend the default algorithmsCreate new default relevancy models. Blend static and dynamic ranking parameters to instantly improve search results.

Page 15: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

How to create a Rank ProfileIT Pros are empowered to create new profiles quickly

Rank Profiles created in PowerShell by extending the default relevancy algorithm…

… and are exposed in the user interface by modifying the sorting

web part.

Page 16: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Back End Processing Tasks:Load content from many different places

Out of the box connectors for SharePoint, exchange public folders, and shared filesSharePoint Designer to configure connection to customer portfolio/holdings database

Create custom metadata with content processing pipeline

Names of holdings, offerings, key concepts, companies, peopleSynonyms for key concepts (real estate ~ REIT)Roll-ups configured with optional results collapsing stage

Create custom relevance profileDesigners can stylize the User Interface

Apply styles to web partsFederation, People Search, Search actions

Build custom web parts for visual navigationUse SharePoint workflows to perform business specific actions

Leveraging the platform to build applicationsPutting together all of the pieces to build search-driven applications

Page 17: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Simplified, powerful administrationA high-end enterprise search solution that’s easy to deploy and manage

Deploy easilyusing wizard-driven installation, a topology designer, and native support for 64-bit virtualization

Manage efficiently with full support for Microsoft System Center and PowerShell scripting to automate tasks

Streamline administrationwith a simplified admin console that helps you manage search services across your enterprise

Page 18: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Architecture

FS4SP

Page 19: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Microsoft’s 2010 Dog-Food FarmDescription: Team Collaboration Portal & Social NetworkingDay to day work and internal experiments

Farm’s Total Data Size 1.8 TB Largest Content Database 800 GB Largest Site Collection Size 280 GB Logging DB Size (14 days) 300 GB Number of Web Applications 4 Number of Content Databases 13 Total number of Site Collections 7,700 (7,200 my sites) # of User Profiles in Profile DB 193,000 Total number of Documents 4 Million

Workload: Total number of users per week: 15,200 Concurrent users (Distinct Users per Minute) ~200 Total Requests per day: ~7,000,000 Hourly Average RPS [Requests per Sec]: ~150 Hourly Max RPS [Requests per Sec]: 270

Data Set:

Search Full Crawl generating ~75%

Page 20: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

FAST Search for SharePoint Scaleout

Content Volume

Query Volume

Scale-out multiple “dimensions”

Query VolumeContent VolumeIndexing freshness

Redundancy optionsSearchIndexing

Performance targets*30M Docs/node50 QPS/node35 docs/sec

*Depends on content and hardware specifics

Search and Indexing

Crawling and Content

Processing

Query and Result

Processing

Back-end with extreme and flexible scale out options

No theoretical upper bounds!

Page 21: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

FAST Search Server 2010Summary of architectural components

Custom Front-End

OpenSearch or Other Sources

SharePoint Front-end

People Search

Qu

ery

Obje

ct M

od

el

Query and

Result Processin

g

Security AccessModule

SearchCore

Indexing

Federation Object Model

Query Web Service

Advanced Content

Processing

Linguistics

WebLink

Analysis

Connectors

• Web Crawler

• JDBC

Connectors

• SharePoint• File

Traverser• Web • BDC• Exchange• Notes• Documentu

m

Microsoft System Center Operations Manager

Monitoring Services

Administration and Schema Object Model

Site Collection Level Admin UI

• Keyword Management• User Context

Management• Site

Promotion/Demotion

PowerShell

• Schema configuration• Admin configuration• Deployment

configuration

Central Administration UI • Property mapping• Property extraction• Spell-checking

FAST Server(s)

SharePoint Server(s)

Other Server(s)

Content

Page 22: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Enhance SharePoint platform capabilities with out-of-box features, services, and tools that streamline development of solutions with deep integration of External Data and Services.

Dynamics SAP Siebel LOB

Web 2.0

DevPlatform

Business Intelligence

Enterprise Content

Management

CollaborationSocial

Enterprise Search

Model Store

BDC Runtime

LOB/Doc Binding Security Out of box

Parts

Office Apps

CacheOffline

Operations

DesignTools

SPD

VSTO

SharePoint

BDC Client Runtime

Search LOB Systems via BDC/BCS

Page 23: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Document Processing Pipeline Stages

Format ConversioniFilters, OutSideIn

Language detection and encodingLemmatizer

Linguistics normalizationTokenizer

Word breakingEntity Extraction

Persons, companies, locations, email, date/time, URL, prices, file names

DateTimeNormalizer Date normalization

Vectorizer Create document vector for similarity searching

WebAnalyzerAnchor text and link cardinality analysis

PropertiesMapperMap to crawled properties

PropertiesReporter Report detected properties

Default Optional

XML Properties mapperOffensive Content FilterVerbatim extractor

Loads dictionary for custom extraction, e.g product names

Field Collapsing

Form

at

Convers

ion

Lang

uag

eD

ete

ctio

n

Enti

tyExtr

act

ion

Config

ura

ble

Sta

ges

Map

per

The different plug-ins can either be configured from UI or from config files

Page 24: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Content Processing and Schema

Extracted document attributes reported as Crawled PropertiesCrawled Properties mapped to Managed PropertiesCharacteristics are defined for Managed Properties, e.g.

RefinersSortingQueryableType

Definition and mapping done via UI or Powershell

Admin UI

Schema CmdLets

Custom Client

Schema Object Model

Schema Service (hosted in IIS)

Property backend bliss psctrl

configserver

Update ToolsPersistence

Document Processing Pipeline

PropertiesMapper

PropertiesReporter

Update configuration

Alert pipelineof updatedschema

Report discovered crawled properties

Page 25: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Pipeline Extensibility API

MotivationStraightforward way to add text analysis functionalityFlexibility and supportability

Example usesSentiment analysisTranslationAuto-Classification

MechanismJust before Mapper“any” binaryRuns in sandbox with timeout

Extensibility

MapperStandard processin

g

Page 26: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

100 million documents per farmRefiners: only uses the first 1000 resultsSearch is restricted to one farm

Yeah, So What?Tell Me Something Awesome

40 Million Documents per serverRefiners: exact count from the entire result setContent can be indexed and search across farms

3.6 TB of disk space per server (so far!) and support for NAS and SANs.Full support for VMs (Hyper-V and VMware)

SharePoint 2010 Fast Search Server 2010

Page 27: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

There is nothing wrong with SharePoint!

SharePoint brings together a number of collaborative technologies that would otherwise not play well togetherAs SharePoint adoption spreads the need for enterprise search only increasesSearch today is where RDBMSs were over 20 years ago

Let me say that again: there is nothing wrong with SharePoint!

Is Something Wrong With SharePoint?

Page 28: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

The PresentSharePoint 2010 search addresses a host or previous issuesNo migration path from SP 2010 to Fast Search 2010

The FutureWhere do you think Fast Search Server will be in 3 years (the next release of SharePoint)?

Why Fast Search Instead of SharePoint Search?

Page 29: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

You’ve Got QuestionsI’ve probably got answers…

Q and A

Page 30: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Demo: Fast Search Server 2010FAST: A Brief Time of HistorySharePoint 2010

Search features

Fast Search Server 2010FeaturesArchitecture

Why Fast Search Server instead of SharePoint search?

Agenda

Page 31: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

The organizers of SharePoint SaturdayTo all of you for attending!

Thanks

Page 32: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

Capacity Planning White Paperhttp://www.microsoft.com/downloads/details.aspx?FamilyID=65b799e3-825c-4398-8cd7-3311d3297997&displaylang=en

RSS: FAST Search Server 2010 for SharePoint Newly Published ContentIf you bookmark only one RSS feed for Fast Search Server 2010 this is the one: http://services.social.microsoft.com/feeds/feed/FASTSearchServer2010NewContent

DocumentationTechNet: http://technet.microsoft.com/en-us/library/ee781286.aspx

MSDN BlogsEnterprise Search: http://blogs.msdn.com/b/enterprisesearch/Steve Nicolaou, Fast Architect: http://blogs.msdn.com/b/stevennicolaou/Jørgen's FAST Search Blog: http://blogs.msdn.com/b/jorgeni/ Dark Corners: http://blogs.msdn.com/b/dark_corners/

Enterprise Search User GroupSecond Wednesday of every month! You missed July! Don’t miss August!

Case Study: Search and the FBI Sentinel Program Author: Marti Hearst, Search User Interfaces (http://www.searchuserinterfaces.com/)Next Generation Tools: Content Transformation Service/Interaction Management Service

References

Page 33: Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the

date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

microsoft.com / Enterprise Search

Thank You.