Download - II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)
![Page 1: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/1.jpg)
The Road to Federated Text Mining: Are we there yet?
II-SDV 2014
Guy Singh
![Page 2: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/2.jpg)
Click to edit Master title style Click to edit Master title style
“Federated search is an information retrieval technology that allows the simultaneous search of multiple searchable resources.
2
What is federated search?
A user makes a single query request which is distributed to the search engines participating in the federation”
- Wikipedia
![Page 3: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/3.jpg)
Click to edit Master title style Click to edit Master title style Current Situation
• Volume of data ever increasing
• Proprietary content can reside within Enterprise
• No need for everyone to keep standard sources up-to-date
• Data from content providers can reside on their sites
Linguamatics Customer Confidential 3
Internal Content External Content
MEDLINE Clinical Trials
Publisher Content
FDA Drug Labels
Patents
![Page 4: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/4.jpg)
Click to edit Master title style Click to edit Master title style
Data Sources
Scientific Literature
Social Media
News
Web Pages
Internal Documents
Patents
RSS
Clinical Trials
4
Increasing Range of Data Sources
![Page 5: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/5.jpg)
Click to edit Master title style Click to edit Master title style
5
Varying in Structure
![Page 6: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/6.jpg)
Click to edit Master title style Click to edit Master title style How does text mining differ from keyword search?
Example: What genes affect breast cancer
![Page 7: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/7.jpg)
Click to edit Master title style Click to edit Master title style
• Searching across documents using keywords is relatively trivial
– Do not need to be aware of where the words occur and in what context
• Text mining documents with varying structure requires a more sophisticated approach; Need to:
– Know where words matching entities/concepts occur
– Disambiguate depending on context and location
– Find terms in particular regions/parts of document for targeted searches
7
Why does document structure matter?
![Page 8: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/8.jpg)
Click to edit Master title style Click to edit Master title style
• Integrate the data together into a data warehouse
– Extract, Transform and Load each data source into a new database
– Multiple copies of the data
– Data normalisation can be difficult and challenging
– Time consuming and expensive process
– Most database vendors take this approach
– Allows users to perform a single search across all the content
• Leave the data where it is, federated content
– Data remains in it’s original form and location
– Multiple data types
– Multiple network locations
– Single search across multiple different data sources
8
Approaches to dealing with different data sources
![Page 9: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/9.jpg)
Click to edit Master title style Click to edit Master title style
Data Normalisation
Link the Content Servers
Merge Results
Federated Text Mining
9
How do we get to Federated Text Mining?
![Page 10: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/10.jpg)
Click to edit Master title style Click to edit Master title style
10
Data Normalisation – Virtual Indexes
Pathology Reports Index
Journal Abstracts Index
Virtual Index
![Page 11: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/11.jpg)
Click to edit Master title style Click to edit Master title style
11
Data Normalisation – Document Structure
Pathology Reports
Journal Abstracts
![Page 12: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/12.jpg)
Click to edit Master title style Click to edit Master title style
12
Data Normalisation - Entities
Journal Abstracts
Pathology Reports Combined
(Normalized)
![Page 13: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/13.jpg)
Linking Content Servers
Linguamatics Customer Confidential 13
![Page 14: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/14.jpg)
Click to edit Master title style Click to edit Master title style
• I2E 4.1 introduced a new feature – Linked Server
• One I2E server can be linked to another I2E server
• Provides access to remote and local indexes and queries through a single I2E interface (Linked Servers)
– Indexes and queries on remote servers on the network appear the same as local indexes
Linked Servers
Development Status
![Page 15: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/15.jpg)
Click to edit Master title style Click to edit Master title style
Linguamatics – Customer confidential
I2E 4.1 Linked Servers
I2E Enterprise on Customer network
I2E OnDemand SaaS
Infrastructure
In-house Indexes
I2E OnDemand Standard Indexes
I2E Enterprise Access
Custom Indexes
Access via Linked Servers
Access via single UI
![Page 16: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/16.jpg)
Merging Results (Part I)
Single Server, Multiple Queries
![Page 17: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/17.jpg)
Click to edit Master title style Click to edit Master title style I2E 3.0 (2009) – Merging Results (part I) from one server
Profiling Individuals
• Example from news reports related to pharmaceutical industry
• Pick up properties from one document or many
© Linguamatics 2012 - Customer Confidential
![Page 18: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/18.jpg)
Click to edit Master title style Click to edit Master title style
© Linguamatics 2013 - Confidential
I2E 3.0 – Merging Results (part I) from one server
Document
Identifier
Patient
information Disease history
Patient data
Medications
and dosages
Hit displayed in
context
![Page 19: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/19.jpg)
Merging Results (Part II)
Linguamatics Customer Confidential 19
Multiple Servers, Multiple Queries
![Page 20: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/20.jpg)
Click to edit Master title style Click to edit Master title style
20
Each Server supplying separate set of results
Content Server 1
Content Server 2
Content Server 3
Content Server 4
Merge into a single set of results
![Page 21: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/21.jpg)
The Road to Federated Text Mining
![Page 22: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/22.jpg)
Linking Content Servers
![Page 23: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/23.jpg)
Click to edit Master title style Click to edit Master title style I2E 4.0: Multiple Clients, Multiple Results
I2E Server 2 FDA Drug Labels
I2E Server 1 Internal Documents
external network internal network
Linguamatics Customer Confidential 23
![Page 24: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/24.jpg)
Click to edit Master title style Click to edit Master title style I2E 4.1/4.2: Single Client, Multiple Results
I2E Server 2 FDA Drug Labels
I2E Server 1 Internal Documents
external network internal network
Linguamatics Customer Confidential 24
Linked server
![Page 25: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/25.jpg)
Merging Results (Part II)
![Page 26: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/26.jpg)
Click to edit Master title style Click to edit Master title style Q4 2014: Single Client, Single Result, Multiple Servers
I2E Server 2 FDA Drug Labels
I2E Server 1 Internal Documents
external network internal network
Linguamatics Customer Confidential 26
Linked server
![Page 27: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/27.jpg)
Click to edit Master title style Click to edit Master title style Q4 2014: Federated Text Mining Example
• Single Query
• Differently structured data sources on different servers
– Journal Articles (PubMed Central) on Enterprise Server
– MEDLINE on I2E OnDemand
• Single set of results
Linguamatics Customer Confidential 27
![Page 28: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/28.jpg)
Click to edit Master title style Click to edit Master title style The Road to Federated – Are we there yet?
I2E 4.0
Dec 2012
I2E 4.1
October 2013
Next release: in Development
Q4 2014
Merging the Results (part II)
Data Normalisation
Linking Content Servers
![Page 29: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/29.jpg)
Demo
Linguamatics – Customer confidential
![Page 30: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/30.jpg)
Click to edit Master title style Click to edit Master title style
30
Demo
Cambridge
VPN
Nice
Linked Server
Journal Abstracts
Pathology Reports
![Page 31: II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)](https://reader035.vdocuments.us/reader035/viewer/2022070315/554f9ee5b4c905ad218b48e8/html5/thumbnails/31.jpg)
Thank you
Linguamatics – Customer confidential