ii-sdv 2014 automated relevancy check of patents and scientific literature (katrin tomanek and...

Post on 11-May-2015

459 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dr. Philipp Daumke

Analyze Text, Gain Answers

ABOUT AVERBIS

Founded: 2007

Location: Freiburg im Breisgau

Team: Domain- & IT-Experts

Focus: Leverage structured & unstructured information

Current Sectors: Pharma, Health, Automotive, Publishers & Libraries

PORTFOLIO

PRODUCTS:

CORE TECHNOLOGIES:

CHALLENGE

Exponential growth of data

• need for data-driven decisions

• limited human resources for analysis

New analytics tools needed for

• Semantic search and discovery

• Competitor analysis

• Identification of market trends

• IP landscaping

• Portfolio analysis

• …

Patent applications:

Medline articles:

� (Semi-)Automate patent categorization

with high precision

� Learning system

imitates the behavior of IP professionals

� Semantic search

Search for meanings, not just keywords

PATENT ANALYTICS

PATENT ANALYTICS

TerminologiesText Mining Rules

Text Mining Machine Learning

Patent Collection

TERMINOLOGY MANAGEMENT

Define the ‚semantic space‘ of your technology fields• Keywords

• Categories

• Hierarchies

• ….

Include relevant word lists from your company• Products

• Devices

• Companies

• Components

• Indications

• …

Reuse already existing terminologies on the market

TEXT MINING

Lung metastasis lung metastasis

lung metastases

metastases in the lung

metastases in the lower lobe of the lung

pulmonal metastates

pulmonal relapse of a metastasis

pulmonal filia

pulmonal filiae

lung filiae

lower lobe filiae

TEXT MINING

tumors tumour

cancer

carcinoma

lymphoma

endometrioma

astrocytoma

glioblastoma

seminoma

ALL

leukemia

TEXT MINING

PATENT CLASSIFICATION – MACHINE LEARNING

System learns how to fine-classify patents

�Observes and imitates human decision making

Advantages

• No explicit externalization of knowledge needed

• No rule-writing

• Better results

• System generalizes (higher recall)

• Statistical model can handle „noise“ better than rules

• Ambiguity and textual variations better handled

THE PROCESS OF MACHINE LEARNING

Labeling

• Up to 100 categories

• ~10-50 patents per category

• Hierarchical categories

• Multi-labeling

Learning

• Learn characteristic patterns in labeled data

• Lots of different classification algorithms

Prediction & Review

• Automatically map new patents to categories

• Confidence value for each category

• Different selection criteria

14

POWERFUL FRONTEND

Linguistic full text search

Lingustic

Filters

Patent Summary

Additional info, e.g. picture

Multilabel Classification

USE CASE1: LARGE-SCALE PATENT LANDSCAPING

• Goal: to semi-automatically categorize patents to the

company‘s technology landscape

• Technology Landscape: 35 Classes (8 main classes, 27 sub-

classes)

• 7.000 patents, 10 competitors

• Evaluation

– between automated judgement with expert judgement

– between two expert judgements (Interrator-Agreement)

USE CASE1: LARGE-SCALE PATENT LANDSCAPING

CONFUSION MATRIX

USE CASE1: LARGE-SCALE PATENT LANDSCAPING

Results Accuracy Time Savings

Automated, Scenario I 85% 70%

Automated, Scenario II 82% 80%

Manual (2 expert judges) 80%

Averbis Patent Analytics save up to 80% of time with

accuracy being on par with manual judges!

USE CASE2: RESEARCH LITERATURE RELEVANCY

• Goal: to automatically identify company‘s relevant

literature

• Rule set:

– Mentionings of company‘s indications, products, etc.

– Competitor products and indications

– „Testosterone, but only given externally“

– „Products shall not be found in an enumeration“

– …

PATENT ANALYTICS

Rule SetText Mining,

Machine LearningSearch, Analysis

Medline, Embase

VERAPAMIL

USE CASE2: RESEARCH LITERATURE RELEVANCY

Rule: Testosterone, but only given externally

USE CASE2: RESEARCH LITERATURE RELEVANCY

Rule: Ignore products listed in enumerations

USE CASE 3: SOCIAL MEDIA ANALYTICS

USE CASE 3: SOCIAL MEDIA ANALYTICS

USE CASE 3: SOCIAL MEDIA ANALYTICS

Main Challenge: what is positive, what is

negative?

– „Could somebody please remove the dead bird from the

balcony“?

– „From the breadcrumbs lying under the bed one could live for

ages“

– „The hotel is situated in the crowdiest party district of the town“

– „The toilets were that big that I couldn‘t sit down for …“

USE CASE4: PATIENT RECRUITMENT/DIAGNOSIS SUPPORT

Disease ProfilesInclusion/Exclusion Criteria

Categorization Visualization

Electronic Health Records

USE CASE4: PATIENT RECRUITMENT/DIAGNOSIS SUPPORT

USE CASE4: PATIENT RECRUITMENT/DIAGNOSIS SUPPORT

For further questions, please contact

Dr. Philipp Daumke

philipp.daumke@averbis.com

+49 761 - 203 9769 0

top related