content discovery through entity driven search

24
ECIR 2014 Industry Day Content Discovery Through Entity Driven Search Alessandro Benedetti http://uk.linkedin.com/in/alexbenedetti Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales 16 th April 2014

Upload: alessandro-benedetti

Post on 21-Jan-2017

194 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Content Discovery Through Entity Driven Search

ECIR 2014 Industry DayContent Discovery Through Entity Driven Search

Alessandro Benedettihttp://uk.linkedin.com/in/alexbenedetti

Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales16th April 2014

Page 2: Content Discovery Through Entity Driven Search

• Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle

• Alfresco & Ephesoft certified Platinum Partner

• Red Hat Enterprise Linux Ready Partner

• Crafter & Varnish Gold Partners

• Search Solutions ConsultantAlfresco Partner of the Year 2012 and

2013

Page 3: Content Discovery Through Entity Driven Search

Working effectively together

Who We Are

3

Antonio David Pérez Morales

- R&D Senior Engineer- Master in Engineering and Technology Software- Digital Identity and Security expert- Enterprise Search Background- Semantic, NLP, ML Technologies and Information Retrieval lover- Apache Stanbol Committer- Apache contributor

@adperezmoraleshttp://es.linkedin.com/in/adperezmorales/

Alessandro Benedetti

- R&D Senior Engineer- Master in Computer Science- Information Retrieval background-- Enterprise Search specialist- Semantic, NLP, ML Technologies and Information Retrieval lover

@AlexBenedettihttp://uk.linkedin.com/in/alexbenedetti

Page 4: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

4

• Context

• Problem

• Solution

• Demo

• Future Works

Page 5: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

5

• Context

• Problem

• Solution

• Demo

• Future Works

Page 6: Content Discovery Through Entity Driven Search

Working effectively together

Zaizi R&D Department

6

•Giving sense to the content

• Enriching it semantically

•Adding value to ECM/CMS

• More structured content, easy to manage, link and search,

•Improving search

• Across different domains, data sources, User Experience

• Machine Learning applied research

• Content Organization – Recommendation Systems

Page 7: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

7

• Context

• Problem

• Solution

• Demo

• Future Works

Page 8: Content Discovery Through Entity Driven Search

Working effectively together

Enterprise Search Problems

8

Challenge : Search within Big and Heterogeneus Repositories

• Heterogeneus Data Sources

• Filesystem, DB, ECM/CMS, Email, …

• Unstructured Content

• PDFs, text plain, Word, …

• Documents not linked between each other

• Federated Search needed

• Search across data sources

• Different permissions

• Centralized endpoint

Page 9: Content Discovery Through Entity Driven Search

Working effectively together

Current Enterprise Search Weaknesses

9

• Keyword based

• Low precision

• Ambiguous terms not in context

• Not accurate weighting when keywords are combined in a query

Page 10: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

10

• Context

• Problem

• Solution

• Demo

• Future Works

Page 11: Content Discovery Through Entity Driven Search

Working effectively together

Entity Driven Search

11

• Moves from keywords to Entities

•More understandable to a Human

• Process the unstructured text

• Enrich it

• Build specific indexes

• Use entities and concepts in searches

Page 12: Content Discovery Through Entity Driven Search

Working effectively together

Sensefy

12

• Semantic Enterprise Search Engine

• Federated Search

• Evolved User Experience

• Based on cutting-edge Open Source Frameworks

Page 13: Content Discovery Through Entity Driven Search

Working effectively together

Architecture

13

Page 14: Content Discovery Through Entity Driven Search

Working effectively together

RedLink

14

• Semantic Cloud platform

• Providing Software as a Service

• Manage unstructured data

• Extract knowledge and intelligence

• Make sense of information

• Feed into business processes

• Open-Source based components

• Entity Linking using Knowledge Bases

Page 15: Content Discovery Through Entity Driven Search

Working effectively together

NLP & Semantic Enrichment

15

• From unstructured to structured

• NLP Analysis. POS Tagging

• Named Entities Recognition

• Linked Data

• Entity Linking using Knowledge Bases

• Disambiguation

• Indexing in Solr

Page 16: Content Discovery Through Entity Driven Search

Working effectively together

Smart Autocomplete

16

• Multi Phase suggestions

• Closer to natural language query formulation

• Named Entities infix

• Entity types infix

• Multi Language entity type support

• Properties driven query approach

Page 17: Content Discovery Through Entity Driven Search

Working effectively together

Smart Autocomplete Configuration

17

• Entity type properties

• Interesting to our use case and scenario

• Properties inheritance through type hierarchy

• Enhance type information from external resource

•Freebase, DbPedia , Custom Data Set

Page 18: Content Discovery Through Entity Driven Search

Working effectively together

Semantic Search

18

• Search by Named Entity

• Search by Entity Type

• Search by Entity Type properties

• Grouping Results by Sense

• Contextualize Results Using Semantic Information

Page 19: Content Discovery Through Entity Driven Search

Working effectively together

Semantic More Like This

19

• Search for Similar Documents based on Entities and Entities’ categories

• Similarity Function based on Documents’ Sense

• Not based on text tokens

• Entity Frequency / Inverted Document Frequency

• Entity Type Frequency / Inverted Document Frequency

Page 20: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

20

• Context

• Problem

• Solution

• Demo

• Future Works

Page 21: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

21

• Context

• Problem

• Solution

• Demo

• Future Works

Page 22: Content Discovery Through Entity Driven Search

Working effectively together

Future Work

22

• Semantic More Like This new approach (Graph relations)

• Machine Learning components: Classification, Topic annotation, Clustering

• Semantic facets

• Secured Entity Search

• Image and Media searches

Page 23: Content Discovery Through Entity Driven Search

Working effectively together

Conclusions

23

• Better user experience

• More precision in search results

• Closer to human language

Page 24: Content Discovery Through Entity Driven Search

Zaizi HeadquartersBrook House4th Floor, North Wing229-243 Shepherd’s Bush RoadLondon W6 7ANUnited KingdomT: (+44) 20 3582 8330 Zaizi IberiaCalle Gremios 13-15, Edificio DiseñoPlanta 1, Oficina 541927 Mairena del Aljarafe SevillaSpainT: (+34) 666 42 43 64 Zaizi Asia50 Flower RoadColombo 07Sri LankaT: (+94) 112 301 461 Zaizi Singapore14 Robinson Road #13-00Far East Finance BuildingSingapore 048545T: (+65) 3158 5886F: (+65) 6323 1839

VAT Registration No GB 932 8855 89Registered in England and Wales with registration number 6440931

www.zaizi.com

Thanks!