five creative search solutions using text analytics

15
Five Creative Search Solutions using Text Analytics Joe Hilger, Principal at Enterprise Knowledge

Upload: enterprise-knowledge

Post on 15-Jul-2015

178 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Five Creative Search Solutions

using Text Analytics

Joe Hilger, Principal at Enterprise Knowledge

EK Search Best Practices

• Three primary types of search: find, discovery and legal. Determine which type best fits your needs.

• Facets are the most important feature of search. Properly designed facets solve most search relevancy problems.

• Search is a means to an end. Each result type should be designed to support the action the person is trying to accomplish.

• Search is most powerful when it is used to summarize and query information.

1

Methods of Text Analytics

Method Description

Natural Language Processing

Tools using sentence structure to understand context and identify people, places and things within text.

Entity Recognition Identification of people, places and things through a combination of pattern matching, dictionaries and graph databases.

Statistical Algorithms Statistical models to identify the unique topic of a piece of content within a larger corpus of documents.

Pattern Recognition Extraction of information based on the structure of the document or information within the document.

Auto-categorization The use of any of the above methods to automatically associate content with a pre-defined taxonomy.

2

#1. Contract Analysis

3

The Problem: A large government agency needs to understand how much money they spend on different software applications. Unfortunately, software is bought by decentralized business units and purchased through larger contract vehicles.

The Solution: Use Text Analytics to read contracts and extract vendor names, software product names, contract amounts, and the begin and end date for the contract. This information can be used to create an analytics based search solution that shows charts and graphs of spending on software products.

#1. Contract Analysis - Results

4

#2. Company Information Search

5

The Problem: A financial news provider needs to create a more compelling search for their paid subscribers. Subscribers need consolidated information about companies that is easy to find.

The Solution: Use text analytics to build a database of every company mentioned in the article, newsletters, financial reports and analyst reports. This database is indexed by the search engine and the companies are exposed in the type ahead list and as a quick answer in search.

#2. Company Information Search - Results

6

#3. Medical Research Search

7

The Problem: A medical researcher needs to understand the most commonly recommended treatments for diseases. They have access to hundreds of thousands of medical journal articles, but no way to summarize the information without reading every article.

The Solution: A text analytics tool identifies diseases, treatments, diagnostic methods and symptoms from within every article stored in the medical database. The frequency in which each disease and the recommended treatment are found in the articles is displayed in search results.

#3. Medical Research Search - Results

8http://pairs.demo.marklogic.com/

#4. Publishing Knowledge Base

9

The Problem: People want to understand where research is being done on medical topics that are important to them. A scientific and technical publisher had a large corpus of information that would make this possible if the content could be summarized.

The Solution: The publisher uses a text analytics tool to identify the location, author, date and institution responsible for publishing the article, or publication. They developed a search that showed all of this information based on the location where the content was published.

#4. Publishing Knowledge Base - Results

10http://authormapper.com/

#5. Improved Knowledge Sharing

11

The Problem: Science.gov was created in order to better disseminate scientific information to the American public. Science.gov had access to hundreds of thousands of articles, but needed to find a way to make the information easier to find and understand.

The Solution: Use statistical text analytics to identify the relevant topic(s) for each article. Create a search that allows people to filter their results in an appealing graphical format.

#5. Improved Knowledge Sharing - Results

12http://www.science.gov/

Building Blocks for Search Success

• Action oriented result types

• Entity Extraction

• Taxonomies

• Auto-generation of meaningful content

• Augment your information with content from external data sources

13