learn more about entity extraction may 2014

40
2022-06-27

Upload: anders-haeggdahl

Post on 25-Jun-2015

321 views

Category:

Technology


1 download

DESCRIPTION

Read this and get inspired how to use entity extraction to better consume unstructured information assets in your organization. This presentation is made by my colleague Paula Petcu

TRANSCRIPT

Page 1: Learn more about Entity Extraction May 2014

13 april 2023

Page 2: Learn more about Entity Extraction May 2014

Overview of scenarios

Page 3: Learn more about Entity Extraction May 2014

Scenarios | Benefits of using entity extraction

Explore your contentExplore the enterprise graph

Discover insights about your productsMonitor trends

Discover new expertise inside your organizationFind the people with the right competences

Enhance search navigationFilter unstructured data

Page 4: Learn more about Entity Extraction May 2014

Scenarios | Benefits of using entity extraction

Prevent duplicate workFind similar content

Help your users find their dream homeExtract potential decision criteria from natural language

Visualize your content in a new wayEnrich documents with metadata

Page 5: Learn more about Entity Extraction May 2014

Discover new expertise inside your organizationFind the people with the right competences

Page 6: Learn more about Entity Extraction May 2014

Motivation

• Search for “usability”

• Only people that have tagged themselves with “usability” will be returned

• If we rely only on standard category types, database information, we get only what is in that person database

• But what if you could find also those that write, blog, or tweet about “usability”, without them being explicitly tagged with this category?

Page 7: Learn more about Entity Extraction May 2014

Enhanced search index

• The search index is enhanced with information about what topics, keywords, people, places, etc. authors write about

Page 8: Learn more about Entity Extraction May 2014

• Search for “usability”

• Get improved search results

Discover competences people have

Discover interests people have and share

Gather all people writing about the same topic

Enhanced expertise search

Page 9: Learn more about Entity Extraction May 2014

Enhance search navigationFilter unstructured data

Page 10: Learn more about Entity Extraction May 2014

Motivation

•Search for “yoga”

•Lots of semi-structured documents (HTML, Word, PDF, etc)

•Some are missing administrative metadata such as author, date last saved

•Some are missing descriptive metadata such as title, topic, tags, category

No proper title

Will you go through all results to find the relevant ones?

Page 11: Learn more about Entity Extraction May 2014

Extract named entities and metadata

•Identity and add to document information such as title, keywords, author, summary, subsection titles

Page 12: Learn more about Entity Extraction May 2014

New filters and improved metadata

• Search for “yoga”

• The newly created data is used to filter documents and improve relevance

Improved visual results (documents have titles)

Improved relevance (titles and subsection titles are ranked higher than body text)

Possibility to filter on authors, topics, places, etc (use the filter rather than pagination)

Page 13: Learn more about Entity Extraction May 2014

Explore your contentExplore the enterprise graph

Page 14: Learn more about Entity Extraction May 2014

Motivation

• Search for ‘Copenhagen’ on your intranet

• Ambiguous query

• Lots of results

• Missing context

• What is the user intent with this query?

Page 15: Learn more about Entity Extraction May 2014

Relationship Extraction for Entities

• Extract relations from unstructured data

• Built upon named entity recognition

• Relationship extraction enables us to do build a graph search solution with unstructured data

Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.

Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis. Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.

Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.

Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis. Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.

Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.

Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis. Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.

Sarah Jensen

Philadelphia Copenhagen

Google

Anders Anderson

Findwise

Microsoft

Carl Sorensen

Sarah Jensen

Philadelphia Copenhagen

Google

Anders Anderson

Findwise

Microsoft

Carl Sorensen

Page 16: Learn more about Entity Extraction May 2014

Suggestions as you type, using the graph

• Search for ‘Copenhagen’ on your intranet

Narrow down search results directly from the search box

Disambiguate the query by selecting one of the different type of suggestions (consultants, projects, partners)

Navigate directly to 2nd or higher level connections on the graph

Page 17: Learn more about Entity Extraction May 2014

Business Intelligence, using the graph

•Search for: ’Customers where we have done Projects based on Google technology with at least 1000 hour consulting time and a revenue of more than 1 MDKK and the word ”e-commerce” is mentioned many times in the Project Documentation’

Business Intelligence

Project numbers

(worked hours)

Financial numbers (revenue,

profits)

Project Documen

tation

How would this query look like in SQL?

Page 18: Learn more about Entity Extraction May 2014

Discover insights about your productsMonitor trends

Page 19: Learn more about Entity Extraction May 2014

Motivation

• Search for the product name ‘Tusin’

• Product is mentioned in different sources, under different contexts (user feedback, marketing material, internal specifications), and using different terminologies (on social media compared to website)

• How to keep track of all information?

• How easy is it to identify trends?

Page 20: Learn more about Entity Extraction May 2014

Identify the same product in different contexts

• Identify the entity denoting the same product from different sources

Internal name for the same product

InternalProductionSpecification

Product Marketing Material from Website

Feedback about the marketing material / the experience of the user Mentions the

product

User feedback

Metric

InternalIssues ManagementSystem

Page 21: Learn more about Entity Extraction May 2014

Monitor trends on your products

• Search for ‘Tusin’

or

• Remember it as a search term and create a dashboard with content driven by search

Monitor trends

Reduce time for replying customers or users

Stay competitive

Page 22: Learn more about Entity Extraction May 2014

Prevent duplicate workFind similar content

Page 23: Learn more about Entity Extraction May 2014

Motivation

• Just started working on a new material in a construction company

• What is the cost of duplicating the work?

• Will you perform a search on previous work?

• What if another team has a similar initiative?

Page 24: Learn more about Entity Extraction May 2014

Enhanced Search Index

• Automatically extract entities and representative keywords from content

Documents

Announcements

Public EmailsNewsfeed

Steel Structures

Glass Type 1.A

Project ANSATorso Tower

Polyethylene Terephthalate

Page 25: Learn more about Entity Extraction May 2014

Prevent duplicate work

• Get suggestions of similar work based on extracted entities

Identify similar work early in the project

Identify potential collaborations

Prevent duplicate work

Page 26: Learn more about Entity Extraction May 2014

Visualize your content in a new wayEnrich documents with metadata

Page 27: Learn more about Entity Extraction May 2014

Motivation

• Search for “financial results Copenhagen”

• Search results: documents

• Clicking on a result opens the document

• Does this search answer the user question?

Page 28: Learn more about Entity Extraction May 2014

Identify entities in documents

• Identify locations, revenues, departments, etc from semi-unstructured data

• Combine with data in spreadsheets or databases

Documents

Database

Spreadsheets

Answer

Page 29: Learn more about Entity Extraction May 2014

Visualize your content in a new way

• Search for “financial results Copenhagen”

• Additional information shown

• Can show computed results

Enrich documents with metadata

Visualise the content

Compute answers

Make comparisons

Create dashboards based on searches

Page 30: Learn more about Entity Extraction May 2014

Help your users find their dream homeExtract potential decision criteria from natural language

Page 31: Learn more about Entity Extraction May 2014

Motivation

• Searching for an ‘apartment with

a good view, located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’

• The apartment information consists of mostly structured data (m2, number of rooms, post number, floor)

• Can we improve the search experience?

Long list of static filters

Search query consists of an area (post code, street etc.)

Page 32: Learn more about Entity Extraction May 2014

Understanding what the users want

• Here’s how Facebook helps users define their queries:

• Can we interpret the query ‘apartment with a good view,

located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’ ?

Page 33: Learn more about Entity Extraction May 2014

Understanding what the users want

• Searching for ‘apartment with a

good view, located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’

• Apartments with 3 rooms are shown in search results but those with less are not excluded

• Those that mention shopping outlets (such as Netto or Fakta) are boosted

Interpret natural language

Boost results based on ‘preferences’

Better search experience

Increase user satisfaction

Boost those with 3 rooms(boost on map can be represented by a bigger pointer)

Free text search

Page 34: Learn more about Entity Extraction May 2014

Behind the scene

Page 35: Learn more about Entity Extraction May 2014

Entity Extraction

Entity extraction is the process of identifying named entities (such as locations, people, companies) in a block of text

Add structure to unstructured data

New possibilities of interpreting the data

Improve data quality and findability of documents

Reduce time spent by users manually structuring content

Page 36: Learn more about Entity Extraction May 2014

Entity Extraction Framework

Combines dictionaries with trained model and regular expressions based on needs

Scalable, adaptable and extendable framework

Automatically enrich documents with named entities

Iterative approach to continuously improve accuracy

Built by Findwise as a reply to our customer requirements and vision

Page 37: Learn more about Entity Extraction May 2014

Entity Extraction Framework

AutotagEdit

Evaluate Incremental train

90% accuracyThe Danish and Swedish

entity extractors can reach 90% accuracy

Page 38: Learn more about Entity Extraction May 2014

Graphical Annotation Tool

Visual representation of annotated documents

Annotate more documents to improve precision

Easy-to-use, point and click interaction

Built by Findwise as a reply to our customer requirements and visions

Page 39: Learn more about Entity Extraction May 2014

Graphical Annotation Tool

Page 40: Learn more about Entity Extraction May 2014

Anders Hä[email protected]