websphere information integrator content edition and … technical overview_for... ·...

52
® IBM Software Group © 2004 IBM Corporation WebSphere Information Integrator Content Edition and OmniFIND Technical Overview

Upload: dinhdiep

Post on 10-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

®

IBM Software Group

© 2004 IBM Corporation

WebSphere Information Integrator Content Edition and OmniFINDTechnical Overview

IBM Software Group | WebSphere software

2

WebSphere Information Integrator Content Edition

The Problem and the SolutionIntegration and ConnectorsFederation ServicesDeveloper and End-user ServicesPortfolio IntegrationsQuestions

IBM Software Group | WebSphere software

3

The Problem: Multiple Silos of Content

36%

14%

25%

17%

1 repository5%

2-5 repositories

6-10 repositories10-15 repositories4%

More than 15 repositories

Don't know “The Future of Content in the Enterprise,”Connie Moore and Robert Markham

Base: 81 North American decision-makers(multiple responses accepted)

IBM Software Group | WebSphere software

4

Multiple Content Sources Complicate Enterprise Initiatives

CUSTOMERSERVICE MARKETING CUSTOMERS

& PARTNERS LEGAL HR R&DFINANCE SALES & SUPPORT

Imaging/Document

Mgmt

ReportMgmt

Web Content/Media Asset

Mgmt

Database/CustomSystems

NetworkFile

Systems

Workflow/Business

Process Mgmt

SELF-SERVICE COMPLIANCECALL CENTER CRM / ERP WEBSITES

IBM Software Group | WebSphere software

5

The Solution: II Content Edition

CUSTOMERSERVICE MARKETING CUSTOMERS

& PARTNERS LEGAL HR R&DFINANCE SALES & SUPPORT

Imaging/Document

Mgmt

ReportMgmt

Web Content/Media Asset

Mgmt

Database/CustomSystems

NetworkFile

Systems

Workflow/Business

Process Mgmt

SELF-SERVICE COMPLIANCE WEBSITESCALL CENTER CRM / ERP

Information IntegratorContent Edition

IBM Software Group | WebSphere software

6

The Solution: II Content Edition (formerly known as VeniceBridge)

IBM Software Group | WebSphere software

7

II Content Edition FunctionalityReal-time, actionable search

Federated search IICE-enabled indexed search

“Portal” applicationSingle access point for all enterprise assetsCustomer or partner access to relevant content for self-service

Virtual Records managementManage retention of all enterprise content

Enterprise Information Integration (EII)Complete view of all information about a specific topic object, project or process

Production workflowAccess all the required content to enable people to perform actions in a business process

IBM Software Group | WebSphere software

8

The Tao of II Content EditionEvery user is known to the repository

Repositories handle their own authentication and authorizationThis is sometimes supplemented, but NEVER bypassed

Every II CE user has named sessions with the repositories being accessed

Repositories can only do what they can doII CE doesn’t generally compensate for missing features in repositoriesII CE offers a superset of functionalityRepository profiles describe the capabilities of individual repositories

Users know of only abstract repositoriesThere is never an API class or method specific to a repositoryUsers do know that multiple repositories are being dealt with

Repositories are abstracted, not virtualized (only exception: Virtual Repositories)

Content and metadata is never stored by II CEII CE is not a content management system, it doesn’t have a repositoryUsing II CE doesn’t require replicating your content or meta-data

IBM Software Group | WebSphere software

9

WebComponents

Web Svcs API(SOAP) Java API URL

Addressability

FederatedSearch

Virtual Repositories

MetadataMapping

View Services Authentication/Security

Subscription EventServices

Subscriptions

Admin Tools

Connecter Service Provider Interface (SPI)

Connector RMI Proxy Web ServicesProxy

Web Client Enterprise Applications

Custom Applications

DataSource

WebSphere Application Server

Connector Connector

DataSource

DataSource

Developerand End

UserServices

FederationServices

IntegrationServices

Access Services

Session Pools

IBM Software Group | WebSphere software

10

Integration Services – Unified Content Model

Content FunctionsCRUD

Meta dataNative content

Content Classes (meta meta data)Checkin / checkoutVersioning, Version HistorySecurity

CRUD

RenditionsAnnotationsCompound Documents

Folder Hierarchy FunctionsCRUD

Content Classes

Folder Contents

Folders Filed In

Security

IBM Software Group | WebSphere software

11

Integration Services – Unified Workflow ModelWork Item Functions

CRUDWork itemsAttachments

Work Item Classes (meta meta data)Get HistoryCompleteLock/UnlockSuspendResume/ReassignAd-hoc Route

Workflow FunctionsQueue Enumeration

Groups for a Queue

Users for a Queue

Queue Contents

SecurityCRUD

IBM Software Group | WebSphere software

12

Integration Services - ConnectorsConnectors do the work to support the Integration Services content and workflow modelsOut-of-the-box connectors to many sources (~20)Numerous options for distributing connectors

Remote EJBsRMI ConnectorWeb Services Connector

Session PoolingConfigurable stand-by pool of repository sessions improves performance and scalabilitySupports named and generic users

IBM Software Group | WebSphere software

13

Integration Services - ConnectorsConnector for Documentum

Documentum 4i and workflow Documentum 5 and workflow

Connector for IBM

• DB2 Content Manager 8• DB2 CM OnDemand • WebSphere MQ Workflow• WebSphere Portal Doc. Mgr.• Lotus Notes• Lotus Domino.Doc

Connector for FileNet

•FileNet Image Services and workflow•FileNet Content Services•FileNet Image Services Resource Adapter (ISRA) •FileNet P8 Content Manager•FileNet P8 BPM

Connector for Microsoft

Microsoft Index Server/NTFSMicrosoft SharePoint Services

Connector for OpenText

OpenText Livelink

Connector for Stellent

Stellent Content Server

Connector for Interwoven

Interwoven TeamSite

Connector for Hummingbird

Hummingbird Enterprise 2004 DM 5.1

Chargeable part numbers

RDBMS

DB2 UDBOracleOthers through II Federation

Others

Microsoft NTFSFile system (sample)

IBM Software Group | WebSphere software

14

Integration Services - Connectors

Connector SDKSame toolkit used to build all of our connectors Design is complete (mapping is what’s required!)~25 content Methods, ~15 workflow methodsRepository profile to simplify developmentSimple Java classes

Robust J2EE benefitsImmediate leverage by platformNo platform bleed into connector

100% forward compatibility Excellent docs / examples

IBM Software Group | WebSphere software

15

WebComponents

Web Svcs API(SOAP) Java API URL

Addressability

FederatedSearch

Virtual Repositories

MetadataMapping

View Services Authentication/Security

Subscription EventServices

Subscriptions

Admin Tools

Connecter Service Provider Interface (SPI)

Connector RMI Proxy Web ServicesProxy

Web Client Enterprise Applications

Custom Applications

DataSource

WebSphere Application Server

Connector Connector

DataSource

DataSource

Developerand End

UserServices

FederationServices

IntegrationServices

Access Services

Session Pools

IBM Software Group | WebSphere software

16

Federation ServicesFederated Search

Cross-repository, data mapped searchesMeta data and full text searchesParallel search, single unified result set“Actionable” results

Virtual RepositoriesCreate virtual repositories of all the content and work items related to a specific project, process, business object or topicWorks at many different levels of granularityVirtual Repositories can contain links to:

Content, work items, folders, queues, virtual folders, smart folders, URLs, custom objects

Supplemental meta data and security

IBM Software Group | WebSphere software

17

Federation Services

Metadata mapping (data maps)Discovery of content and workflow classes (schemas)Map disparate indexing schemes across repositories

Effective for search and CRUD

Named data maps for specific applications or uses

Authentication and Single sign-onAuthenticate once, access all repositoriesStill a specific named userMultiple directory services supported

LDAPActive DirectoryEmbedded

IBM Software Group | WebSphere software

18

Federation Services

View ServicesServer-side conversion

Electronic document to HTMLImage conversion and processing

Client-side Java viewer component (JavaBean)Image viewing and processingAnnotationsPrintingSigned Java applet

IBM Software Group | WebSphere software

19

Federation ServicesSubscription Event Services

Automated change notifications on content, searches and workflowItems are subscribed to, then monitored for changeEvents can be handled by custom handlers (fax, e-mail, workflow, synchronization)API, FrameworkPossible use cases:

Content Integration: Services to facilitate synchronization between repositoriesPortal: Portlet interfaces to provide subscription notification of changes to specific documents or work itemsCollaboration: When a final contract is received, e-mail notifications should be sent to stake holdersPublishing: Notifications to agents, brokers, and end-users on policy addendums via multi-channel strategy (e-mail, workflow, fax, etc)

IBM Software Group | WebSphere software

20

WebComponents

Web Svcs API(SOAP) Java API URL

Addressability

FederatedSearch

Virtual Repositories

MetadataMapping

View Services Authentication/Security

Subscription EventServices

Subscriptions

Admin Tools

Connecter Service Provider Interface (SPI)

Connector RMI Proxy Web ServicesProxy

Web Client Enterprise Applications

Custom Applications

DataSource

WebSphere Application Server

Connector Connector

DataSource

DataSource

Developerand End

UserServices

FederationServices

IntegrationServices

Access Services

Session Pools

IBM Software Group | WebSphere software

21

Developer & End-user Services

Applications can work with IICE at any of these levels:Programmatic (API)

Loosely coupled (Web Services)

UI only (Web Components)

By links (URL Addressability)

IICE web client requires no development effortIt can be set up quickly, to work with any number of repositories

It can be customized in different ways for different users, if desired

IBM Software Group | WebSphere software

22

Developer Services – Integration Options

Java APIRich object-oriented content management APIAbstracts all J2EE and system architecture complexityMany good examples includedFinally, developers can write generic, platform-independent ECM applications!

Web ServicesAll the key integration capabilities also available through SOAPSupports both Java and .NET clients

IBM Software Group | WebSphere software

23

Developer Services – Options (continued)

Web componentsSet of 20+ rich web UI componentsBuilding blocks for customizing applications or custom applicationsJ2EE-based - MVC, Struts, JSP, XSLT, JSR 168Various components cooperate in coordinated component groups using a shared event modelCreate new custom components

URL addressabilityCreate very loosely coupled applications using II Content EditionUse in Web applications, send in e-mail

IBM Software Group | WebSphere software

24

Developer & End-user Services - Web client

IBM Software Group | WebSphere software

25

SecurityAuthentication and authorization are controlled by the underlying data sources; their security model is respected at all times

Sessions are created for each “user” with each data source by the data source’s authentication mechanism

Authorization of access to the data source is then controlled for that session by the data source

Having a single user for all users of a data source is discouraged

IICE provides supplemental security services:Single sign-on system

Supplemental authorization system

Identity-aware session pooling

IBM Software Group | WebSphere software

26

Administration and DeploymentAdministration

Centralized configuration and loggingRemote graphical and web-based administrationDynamic configuration (never take WebSphere down!)JMX-based administration of subscription event services

System Architecture and deployment optionsScale from one to dozens of serversLeverage J2EE application servers

Load balancingFault tolerance

Network protocol optionsEJB-to-EJBRMISOAP

IBM Software Group | WebSphere software

27

Repository 2 APIRepository 1 API

J2EEApplicationServer

Servlet Container

IICEArchitecture

- Customer Code

- 3rd Party / J2EE Application Server

- II Content Edition Platform

- Source Licensed to Customer

TemplateProcessor

(JSP/XSLT)II CE Services

Web Application

Viewer Applet

Java APIJava API

- 3rd Party Repository

DB2 II CEViewer Servlet

Web Application

AdministratorTool

EJB ContainerAccess ServicesView Services Server Result Set

Logging Config

Repository 3 API

RMI ConnectorProxy

Connector 3SOAP ConnectorProxy

Repository 1

Apache SOAP

Connector 1

Repository 2

RMI ConnectorProxy Server

Connector 2

Repository 3

Repository 3 API

ApacheSOAP

Application Java API

ApplicationWSDL

IBM Software Group | WebSphere software

28

Scaling Model – Single server or multi-server

Repository

II Content Ed. API

Application

DB2 II Content Ed. API

Application

II Content Ed. API

Application

II Content Ed.II Content Ed.

• J2EE EJB clustering• J2EE Servlet clustering• RMI Connector pooling• Web Services load

balancing

• J2EE EJB Clustering

Connector Connector

Repository

Connector

Multi Server

Repository

Access Services

Connector

II Content Ed. API

Application

Connector

Repository

Single Server

IBM Software Group | WebSphere software

29

II Content Edition and the II PortfolioInformation Integrator OmniFind Edition crawler

Included with OmniFind

Index and search enterprise content in the following repositories:FileNetDocumentumHummingbird

Information Integrator Content Edition wrapperWrapper included with IICE

Access unstructured content from WebSphere II federated server

Wrapper based on DB2 II 8.2’s Java wrapper SDK

RDBMS connector for IICERead-only access to database tables (columns appear as attributes)

DB2 UDB V8.2 or higher, Oracle 10g, and others through II federation

IBM Software Group | WebSphere software

30

II Content Edition and WebSphere

WebSphere Application ServerII Content Edition is a native J2EE application hosted entirely in WebSphereLeverages WebSphere for fault tolerance and load balancing (clustering)

WebSphere PortalJSR 168-based integration between II Content Edition web components and WebSphere PortalII Content Edition web client can be hosted in WebSphere or WebSphere Portal

Portal Environments supported by IICEBEA WebLogic, IBM WebSphere and Microsoft Sharepoint

IBM Software Group | WebSphere software

31

What about JSR-170?What is it?

A standard for Java access to content repositories Not yet widely supported, but that may change

JSR-170 is an incomplete standardImportant ECM functionality not covered (example- no support for workflow)IBM is part of the group working to address this in a future versionOnly covers access to a single repository, no federation capabilitiesOriginally intended only as a standard for accessing content within web sites

JSR-170 and IICECurrently no support for JSR-170 in IICE (or any other IBM CM product)However, since JSR-170 is just another type of repository, an IICE connector could be easily written for it, if it ever becomes popular

®

IBM Software Group

© 2004 IBM Corporation

WebSphere Information Integrator OmniFind Edition

IBM Software Group

33

How to Differentiate ECI and Search Opportunities

Enterprise Search Enterprise Content Integration

One-way access Bi-directional access

Retrieval, display, full CRUD, conversion, browse, foldering, workflow, etc.Deeper access to content; generally part of a production app/workflowAccesses native content in real time

Focused on full unification of systems

Sometimes also about migrating content

Retrieval and display

Casual access -- generally part of a knowledge management strategyIndexes underlying content

Focused on speed/quality of results

Always about leaving content in place

IBM Software Group

34

Why Enterprise Search Matters

Business users need to find information quickly and easilyHigh quality, end user searchGreat diversity in end user search requirements

IT managers need a framework to integrate unstructured informationEasy to install, configure, and manageEnterprise scaleTotal Cost of Ownership – purchase cost, administration cost

Enterprise application developerIntegrate into existing portals and applicationsDevelop with existing tools using existing skills

Information Management

35

UIMA: A new standard for content processing and text analysis

Defines a common interface for integrating text analysis modulesEnables interoperability of different analytics solutions and enterprise applications

Provides an SDK for building and composing text analyticsEnables development of new and re-use of existing components for analysis

Iden

tify

Lang

uage

Find

Wor

ds &

Roo

ts

Cat

egor

izat

ion

Nam

ed-e

ntity

ext

ract

ion

Iden

tify

Rel

atio

nshi

ps

ExtractedMetadataand Facts

TextDatabase

Search Index

ApplicationsText Analysis Modules – aka “Annotators”

Identify Relevant Entities → Build StructurePeople, Places, Organizations, RelationshipsParts, Problems, Conditions Topics, Products, Interests, SentimentTimes, Events, Threats, Plots, Associations

Information Management

36

UIMA Component ArchitectureKey Concepts

Common Analysis Structure (“CAS”) enables pluggable AnnotatorsAnnotators can leverage CAS to build on each otherDifferent “Annotators” are relevant for different collectionsAnalysis results can be sent to multiple “Consumers”

Collection Processing Engine (CPE)

Text, Chat, Email, Audio,

Video

Collection Reader

Aggregate Analysis Engine

Analysis Engine

Annotator

Analysis Engine

Annotator

CAS

CAS Consumer

CAS Consumer

CAS Consumer

Ontologies

SearchEngineIndex

DBs

KnowledgeBases

CASCAS Initializer

CAS

IBM Software Group

37

WebSphere Information Integrator OmniFind Edition

Delivers enhanced results with sub-second response

Sophisticated relevancy algorithms for corporate content

Scales for large collections or enterprisesUp to 20 Million documents1000s of concurrent users

Fits easily into enterprise applicationsJava APIsDocument level security

Eases administration and maintenanceAnalysis features all under-the-covers

IBM Software Group

38

OmniFind Key Technologies

EnterpriseContentCrawling

Scalable Web crawlerData Source crawlersContent Push

Parsing/TokenizingHTML/XML200+ Doc FiltersAdvance Linguistic

SearchCollections

CategorizationTaxonomyRule-based

AnnotationText Analytics Plug-in

IndexingGlobal AnalysisStatic RankingStore

Dynamic RankingFielded SearchDynamic SummaryParametric SearchSpell Checking

Searching

Security

IBM Software Group

39

Search Quality: State of the Art RankingDynamic, term-based factors

(term freq) x (1/doc freq)Lexical affinitiesWhere term is found - title, body, anchor textWeight of text - bold, italic, relative font size

Static or document-based factorsMetadataLinks URLsDuplicate detection

Factor weighting dynamically adjusted based on the type of queryNavigational -- HR

e.g. anchor text weighted higher, link analysis,…Informational – Changing intranet password

e.g. term frequency, lexical affinity,…Search quality tuned to collection type

Intranet (linked documents)Based on date (newsgroup, document currency)Document Collection

IBM Software Group

40

Differentiated Value for IBM Clients

Enhances WebSphere Portal investmentsAccesses more sources Scales to larger implementations Leverages the taxonomy defined in the portal for navigation and classificationMigrates rules for rule-based classificationSurfaces similar portlet with additional features

Extends DB2 Content Manager investmentIntegrate DB2 Content Manager repository into enterprise search applicationsProvides native DB2 Content Manager crawler

Leverages Notes and Domino investmentSearch Lotus Notes file folders in enterprise search applicationsProvides native Lotus Notes crawlerSupports native Domino security meaning it will allow authorized searches down to a an application level

IBM Software Group

41

Comparing OmniFind to Other IBM Search Offerings

WebSphere Portal 5.1

Search Engine

Lotus Extended Search 4.0.2

Workplace 2.5 Search

WebSphere IIOmniFind

Edition V8.2Embedded portal

search technology to index and search Portal portlets and pages via Portal Site search technology, Web content, includes ILWWCM Web published content, Portal Document Manager content, and attachments

Search broker technology included w/ WebSphere Portal Extend Brokers search across supported data sources and indexes. Complements WebSphere Portal search capabilities with reach to additional data, content and index search sources

Embedded searchincluded in Workplace products.Index/search capabilities to: Team CollaborationWeb ConferencingCollaborative LearningPeople Finder Workplace Messaging (client)

Enterprise search engine for intranets, extranets and public corporate websites.and industry applications

Enterprise scale and broad reach to additional content sources for upgrading WebSphere Portal, Lotus Domino and Workplace customers.

IBM Software Group

42

IBM Search CompetitionCompetitor Competitor Weakness IBM StrengthVerity

Autonomy

FAST

Convera

Google Google Search Appliance

Focused on Internet search

Microsoft Entering market. Initial focus on Internet search, unlikely to reach non-MS content repositories.

Verity is largest with 450 employees and $113M revenue (2003)

Small companies with limited resources

Narrow focus on indexing and retrieval.

Most products are very complex with high TCO

Significantly higher price points

IBM views search as an extension to a comprehensive information integration infrastructure that includes information management, content management and information retrieval technologies required for building enterprise applications in the on demand era.

Customers understand they can rely on IBM as a long-term partner

Search technology is a strategic element in IBM’s software stack and, as such, is receiving tremendous focus and investment, $50M annually

IBM Software Group

43

Google - focused on the retail internet market Advertising revenue focused - 95% of revenuePage-ranking system is not optimized for enterprises

Corporate Intranets - fundamentally different: Less content, lower chance of finding the perfect match Poorly linked; linking process more centrally controlledContent stored in many different systems besides webEnterprise security needs differ“Black box” offering runs counter to enterprise HW/OS standards

OmniFind is designed to produce the best results from enterprise content

What about the Google Search Appliance?

IBM Software Group

44

Proven Quality, Scale and Robustness on IBM IntranetQuality – preferred 2:1 over

prior technology

Scale -- 80K queries/day with sub-second response over 7M pages

Indexes9 M unique pages 10,000 websites20 K per document

Processes80 K queries/day7 K queries/hour peakStressed to 10x higher

Robustness -- 99.9% availability since Sept.

24x7 operation

2 – Parsing & Tokenizing

3 – Indexing Build & Push

4 – Searching

CrawlerSearch Servers

Indexer

GO

1 – Crawling

IBM Software Group

45

The w3.ibm.com web site runs IBM search today!

IBM Software Group

46

Enter the search terms that you want, such as:

How to change CMVC password

Advanced Search for more options

Tabs allow you to direct where you want to look.

IBM Software Group

47

Excellent results! Note highlighted terms and word “stemming”. “Lexical affinity” – the proximity

of the words help in search quality

Quicklinks are predefined searches

Each summary is built dynamically – it is based on the search terms that you

entered.

IBM Software Group

48

Advanced Search gives a range of options for focusing your search

IBM Software Group

49

Business Partner Extensions

EndecaMulti Faceted, or Guided Navigation

iPhraseSelf Service Applications, Natural

Language ProcessingAthoc

Subscription and NotificationSun & Son

Expertise Location, Taxonomy Services, Knowledge Management

Muse GlobalAccess to specialized datasources

UpshotDocument Intelligence

IBM Software Group

50

When to choose OmniFIND versus WebSphereContent Discovery Server

Choose OmniFIND when:The need is for Bi-Directional LanguagesThe need is text analytics

IBM Software Group

51

Summary

Most companies have multiple content management systems

Information in these systems is often isolated from key applications and security/compliance policies

IBM offers powerful, well differentiated products for content integration and enterprise search that enable organizations to better leverage and control distributed content assets

These products are key to business solutions related to customer service, records management/compliance, research & intelligence, various production workflows, and more

IBM Software Group | WebSphere software

52

Questions?