chalitha perera | cross media concept and entity driven search for enterprise
TRANSCRIPT
Work Together Effectively
Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera and Dileepa Jayakody R&D Engineers
Work Together Effectively
• Headquartered in London with office in Colombo, Sri Lanka
• Focused on delivering enterprise content management solutions
• Our Skills
Work Together Effectively
Zaizi R&D Department
• Giving sense to the content – Enriching it semantically
• Adding value to ECM/CMS – More structured content, easy to manage, link and search
• Improving search
– Across different domains, data sources, User Experience
• Machine Learning applied research
Work Together Effectively
Agenda
• Problem
• Solution
• Sensefy and MICO
• Demo
• Q&A
Work Together Effectively
Problem • Unstructured Text Content
– Text documents, PDFs, Word …
• Rapid growth in multimedia content
• Heterogeneous Data Sources
– ECMs (Alfresco, Sharepoint), File System, Confluence, JIRA …
• Data is not useful without effective methods for – Knowledge Extraction – Information Retrieval
Work Together Effectively
Current Enterprise Search Limitations
• Limited to keyword based search
• Search context is not considered
• Ambiguity of terms
• Low precision
• Inability to properly handle multimedia files
Work Together Effectively
Desired traits of Solution
• Semantically Enhance documents – Unstructured text – Multimedia documents
• Cross media search
• Search with semantic concepts and entities
• Federated Search
– Search across different content repositories – User permissions
Work Together Effectively
Sensefy • Semantic Enterprise Search Engine
• Cross Media Search
• Federated Search
• Smart Search Assistance
• Open Source
Work Together Effectively
Sensefy Architecture
Work Together Effectively
Repository Crawler • Four types of connectors
– Repository Connectors – Authority Connectors – Transformation Connectors – Output Connectors
• Connect different source repositories with different target indexes – Source repositories (Alfresco, Sharepoint, Confluence etc) – Target Indexes (Solr, ElasticSearch, Amazon CloudSearch)
• Security Model to enforce source repository security policies
Work Together Effectively
Media In Context (MICO) Platform
• MICO provides an integrated platform for – Cross media analysis – Metadata publishing – Metadata querying
• Sensefy uses MICO as the cross media analysis engine to extract entities and concepts
from multimedia
Work Together Effectively
Cross Media Extraction Pipeline
Work Together Effectively
Semantic Content Enrichment
• Named Entity Recognition – People, places, organizations and concepts
• Entity Linking – DBpedia, Yago, Custom Enterprise knowledge bases
• Entity Disambiguation
Work Together Effectively
Entity Search with Suggestions
• Named Entity Suggestions • Ability to query with disambiguated entities
• Search results with high precision – Keyword search results for “ronaldo” - “Cristiano Ronaldo” and “Ronaldo” – Entity Search - will contain only the documents related to selected entity
Work Together Effectively
Entity Search with Suggestions
• Combine entities and concepts for more complex queries
Work Together Effectively
DEMO
Work Together Effectively
Q&A
Work Together Effectively
Thank you.