the next-generation sharepoint: powered by text analytics

Post on 17-Jun-2015

1.100 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PLATINUM SPONSOR

GOLD SPONSORS

THE NEXT-GENERATION SHAREPOINT:

POWERED BY TEXT ANALYTICS

Alyona Medelyan (Pingar)@zelandiya

AGENDA

• Information tasks • Text analytics• APIs• Demos• Conclusions

Information tasksWhat do they cost us?How does SharePoint help?

Emails

Creatin

g doc

s

Analyz

ing in

fo

Search

ing

Review

ing

Gatheri

ng in

fo

Organiz

ing do

cs

Creatin

g pres

entat

ions

Creatin

g imag

es

Data en

try

Doc ap

prova

l

Publish

ing

Transla

ting

14.513.3

9.6 9.5 8.8 8.36.8 6.7

5.6 5.64.3 4.2

1

Avg. hours per week

= $37K year / person

Source: IDC, Hidden Cost of Information (2005)

Emails

Creatin

g doc

s

Analyz

ing in

fo

Search

ing

Review

ing

Gatheri

ng in

fo

Organiz

ing do

cs

Creatin

g pres

entat

ions

Creatin

g imag

es

Data en

try

Doc ap

prova

l

Publish

ing

Transla

ting

SHAREPOINT SAVES TIME Interact with SP from Outlook

Create docs collaboratively Customize search configuration

Define Managed Metadata Configure forms

Design Workflow

Use sites, sets & libraries

Text AnalyticsWhat is it and how does it work?What tasks does it solve?

Text MiningNatural Language Processing

WHAT IS TEXT ANALYTICS?

unstructured data

Opinion MiningBusiness IntelligenceDocument Organization

Data ExtractionSearch

Machine Learning

Text ProcessingStatistics

Linguistics

Emails

Creatin

g doc

s

Analyz

ing in

fo

Search

ing

Review

ing

Gatheri

ng in

fo

Organiz

ing do

cs

Creatin

g pres

entat

ions

Creatin

g imag

es

Data en

try

Doc ap

prova

l

Publish

ing

Transla

ting

TEXT ANALYTICS SAVES MORE TIME

Compose search reports Extract entities

Redact

Generate metadata Fill databases

Cluster search results

Summarize

Mine opinions & sentiment… automatically

Profanity check

Text Analytics SoftwareWhat companies offer text analytics?What are open source tools like?

TEXT ANALYTICS: GLOBAL PERSPECTIVE

User adoption has grown by 25% in 2010 creating an $835 million market because:

• Unstructured data grows (ex. social) Text analytics!

• Text analytics is central to effective information access

• Many successes in NLP: IBM Watson, Wolfram Alpha

Full report by Seth Grimes: http://altaplana.com/TA2011

APPLICATIONS OF TEXT ANALYTICS

Law enforcementMillitary intelligence

Insurance & fraudContent management

OtherFinance

Online commerceProduct design

Life sciencesE-discovery

Customer serviceCompetitive intelligence

ResearchBrand management

Customer experience managementSearch & info access

6%7%

8%8%

9%10%

11%15%15%15%

26%33%

36%39%39%39%

Source: http://altaplana.com/TA2011

SEARCH & INFO ACCESS METADATA EXTRACTION

Document Easy to extract: File type, name & location, creation & modification date, authors

Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned

Metadata

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates Properties

FrequencyPosition

Corpus statsRelatedness

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates Properties

Heuristicscoring

Machinelearning

Scoring

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

SEARCH & INFO ACCESSNAMES EXTRACTION

Document Names

If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

Training data(annotations)

Examples Properties Learning

NLP,Heuristics,Text mining

Machine Learning

<SEARCH + TEXT ANALYTICS> COMPANIES

Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv

BRAND & CUSTOMER MANAGEMENT SENTIMENT ANALYSIS

Sentiment Analysis

If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.

BUT:Negativesuck

terribleawful

Positivefantasticexcellentawesome

Naïve approach: Sentiment-words dictionary!

DocumentDocumentReviewsTweetsSurveys

VisualizationSummary

No sentiment words!

BRAND & CUSTOMER MANAGEMENT SENTIMENT ANALYSIS

DocumentDocumentReviewsTweetsSurveys

VisualizationSummary

Examples

Training data(annotations)

PresencePosition

Part-of-SpeechNegation

Generalization

Properties

Lexicon induction

Learning

Machine Learning

Important: Identifying sentiment bearing sentencesAttaching sentiment to a topic!

SENTIMENT ANALYSIS COMPANIES

Attensity AlchemyAPI LexalyticsSaploMedalliaSAS

RESEARCH TEXT SUMMARIZATION

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

AddressAnnouncement

Details

More details

Conclusion

Extractive summary: As of today, MetaStock has several new functions.Sentence compression: MetaStock has several new functions.

The new interface looks different.Abstractive summary: MetaStock has new features and a new interface.

TEXT SUMMARIZATION COMPANIES

Lexalytics, Pingar

COMPETITIVE INTELLIGENCE:ENTITY & ENTITY RELATION EXTRACTION

Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta

FRAUD INVESTIGATION:NORMALIZATION OF DATES & NAMES

Companies: Cicero, BasisTech

OPEN-SOURCE TOOLS

• NLTK – Apache license, Book, Python & academic datasets, nltk.org

• LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe

• OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp

• GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk

• Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu

APIsWhat’s an API and how does it work?What are the advantages of the API model?Which API is the right one for you?

API ENGINE

API ACCESS

Developer creates an application

Software enginesolves a specific task

An interface thatensures communication

calls via a web service

includes API authentication

a call is an XML messagedescribing the request

a protocol specifies how XML needs to be encoded

• SOAP• REST

SDKusage examples

REST API ACCESS FROM A BROWSER

API requesthttp://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&context=Italian+sculptors+and+painters+of+the+renaissance+favored+the+Virgin+Mary+for+inspiration

API response

SOAP API ACCESS FROM VS2010

SOAP API ACCESS IN POWERSHELL

Read complete blog post “Bulk metadata extraction in SharePoint”: http://bit.ly/powershell-migrate

API = EASY INTEGRATION & FLEXIBILITY• Integrate into existing architecture

via any programming language

• Improve known flaws in the current system/process

• Minimize adoption barriers within the companyno or little training required for stuff

• Only pay for the features you need

• Flexible deployment:• Host API on site = Secure data exchange

• Access the API in the cloud = Save on tech support & hardware

WHICH API IS BEST FOR YOU?

I need to take some text and get a list of the important entities/keywords/phrases.

Blog post on API comparison:faganm.com/blog

Y: Term ExtractorOpenCalaisBeliefNetworksOpenAmplifyAlchemyAPIEvri

API restrictionsSupported languagesQuality of resultsSemantic linksSynonyms/Duplicates

1st2nd

HOW TO CHOOSE AN API:

• Define a specific task• Think of what features are important• Get prepared:

• Subscribe for API keys

• Get SDKs

• Learn libraries

• Find representative data• Build a test framework• Compare results

METADATA EXTRACTION IN SHAREPOINT

Demo Pingar’s add-on for SharePoint 2010 built using a text analytics API

INTEGRATING APIS INTO SCANNING

Video Using Fuji Xerox SmartConnect and Pingar APIto scan documents in batch into SharePoint

http://www.youtube.com/watch?v=kluVp25upag

THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS

• What can be automated?• Metadata extraction, Data entry, Opinion mining,

Sanitization, Doc approval, Summarization, …

• How to integrate text analytics into existing SharePoint applications?• Easy! Via an API

• How to find the right text analytics API?• Review what’s available

Set up an experiment Compare results

Thank you to all of our Sponsors

PLATINUM SPONSOR

SILVER SPONSORS

GOLD SPONSORS

BRONZE SPONSORS

top related