text mining: the next data frontier · 2016-05-11 · to document and classify text mining...
TRANSCRIPT
![Page 1: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/1.jpg)
TEXT MINING:
THE NEXT DATA FRONTIER An Infrastructural Approach
@openminted_eu
Dr. Petr Knoth CORE (core.ac.uk)
Knowledge Media institute, The Open University United Kingdom
![Page 2: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/2.jpg)
OpenMinTeD Establish an open and sustainable Text and
Data Mining (TDM) platform and infrastructure
where researchers can collaboratively create,
discover, share and re-use knowledge from a
wide range of text based scientific and
scholarly related sources.
2
![Page 3: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/3.jpg)
beyond Open Access MAKING SENSE OF
LARGE VOLUMES OF SCIENTIFIC CONTENT
3
![Page 4: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/4.jpg)
The phases of text mining
@openminted_eu
NLP Analysis
Entity
Recognition
Data Mining
Knowledge
Discovery
Information
Extraction
STAGE 1 STAGE 2 STAGE 3 STAGE 4
Information
Retrieval
OPENMINTED -The Open Mining Infrastructure for Text and Data
![Page 5: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/5.jpg)
TDM challenges for researchers
1. Content challenges - Barriers and obstacles due to non-availability,
technical restrictions, copyright law or licensing
issues
- No uniform way to search for, retrieve and
access content for TDM
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
![Page 6: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/6.jpg)
TDM challenges for researchers
2. Services challenges How to identify the most fitting TDM service?
How to combine with other TDM services I have
access to? How to use them on my content?
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
![Page 7: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/7.jpg)
TDM challenges for researchers
3. Processing challenges
Where to deploy? Are my machines powerful enough?
How can I get access to powerful machines?
Where to store intermediate and final results?
How to ensure persistence of storage?
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
![Page 8: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/8.jpg)
OpenMinTeD – Provides solutions
an open and sustainable TDM
infrastructure where researchers can
collaboratively create, discover, share and
re-use knowledge from a wide range of text
based scientific-related sources.
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
![Page 9: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/9.jpg)
OpenMinTeD – working on many fronts
@openminted_eu
10
ACCESSIBLE
CONTENT
DISCOVERABLE
SERVICES
EFFICIENT
PROCESSING
RESEARCH
COMMUNITIES
VALUE ADDED
APPS
Via standardised programmatic interfaces
Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text
Operate on public e-Infrastructures via standarized APIs
Different scientific communities have different challenges
Community-driven applications to illustrate the value of the infastructure. Engage with industry.
OPENMINTED - The Open Mining Infrastructure for Text and Data
![Page 10: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/10.jpg)
The project Started: June 2015
Duration: 3 years
Budget of: €6 million
Grant of: €5.3 million
16 Partners:
- 6 mining research groups
- 3 content providers
- 1 data center
- 1 library association
- 2 legal experts
- 6 community related partners
- 2 SMEs
Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK (CORE) EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling
PARTNERS
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 11: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/11.jpg)
The OpenMinTeD landscape
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 12: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/12.jpg)
Infrastructural approach
OpenMinted does not build
new services, but adopts
and adapts existing services
for new communities
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 13: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/13.jpg)
Infrastructural approach
Focuses on interoperability
across text mining services
and content provision outlets
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 14: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/14.jpg)
Infrastructural approach
Creates and an Open & collaborative space for
researchers to use the best fitting text mining services available building on the
cloud computing philosophy
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 15: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/15.jpg)
@openminted_eu
Data centre Data centre Data centre Data centre
in public cloud
Publisher text corpus
OpenAIRE/CORE text corpus
PMC text corpus
Other text corpora
Other text corpora
Other text corpora
Other types of text corpora
Layer 3:
Interoperability
to shared storage and
computing resources
Language resources Language resources
Language resources Language resources
Layer 2:
Interoperability of
language resources
& corpora
Layer 1:
Interoperability
of text mining services
(platforms or
components)
Language resources and corpora registry service
Platform services
Users: researchers, curators, text-miners and new services developers
Registry Workflow Management Auth2 & Policy management Annotator Accounting
Mining Platforms Mining Platforms Mining Platforms
Proprietary architectures
Mining Platforms
OPENMINTED = The Open Mining Infrastructure for Text and Data
Overview
![Page 16: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/16.jpg)
Interoperability framework
Bringing together mining tools, resources and content
1. Content metadata & transfer standards
To document scientific literature, language resources, taxonomies and provenance as well as transfer protocols for full text retrieval
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 17: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/17.jpg)
Interoperability framework
Bringing together mining tools, resources and content
2. Service metadata & pipelining
To document and classify text mining services, how they receive input, in what form they output their results, how they combine for workflows, what granularity to consider.
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 18: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/18.jpg)
Interoperability framework
Bringing together mining tools, resources and content
3. IPR and licensing
To study IPR restrictions, describe license metadata for re-use, for content and TDM services & tools, and information on how to apply for academic and non-commercial mining research
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 19: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/19.jpg)
OpenMinTeD users
1. End users
- Researchers, data base curators, …
- Novice: use services to advance their science
- Advanced: use TDM services into complex workflows
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 20: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/20.jpg)
OpenMinTeD users
2. Content and service providers
- Publishers, libraries, scientific data base centres, …
- TDM researchERS
- SME’s
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 21: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/21.jpg)
@openminted_eu
RESEARCH
ANALYTICS
SOCIAL
SCIENCES
AGRICULTURE LIFE
SCIENCES
Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results.
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 22: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/22.jpg)
Openminted use case 1
Scholarly communication analytics •Semantic search and discovery of open
scientific outcomes
•Map of academia – scholarly
communication network
•Research monitoring and analytics
Partners CORE/OU, OpenAIRE/ARC, Frontiers
2
4
@openminted_eu
![Page 23: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/23.jpg)
Openminted use case 2
Life sciences •Assisted curation of the EMBL-EBI chemical
databases for metabolomics
•Curation of the neurosciences resources
KnowledgeBase and Neurolex
Partners EBI - Metabolomics, Human brain project
2
5
@openminted_eu
![Page 24: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/24.jpg)
Openminted use case 3
Agriculture and biodiversity •Enrich agricultural databases to assist food- and
water-borne disease outbreak alerts and product
recalls
•Image, figure and dataset discovery in the
AGRIS
Partners INRA, AGRO-KNOW
2
6
@openminted_eu
![Page 25: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/25.jpg)
Openminted use case 4
social sciences Develop and evaluate methods for the automatic
detection and linking of named entities, citation
traces and intentions in social science scientific
publications
Partners GESIS
2
7
@openminted_eu
![Page 26: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/26.jpg)
What can OpenMinTeD do for you?
Are you a content provider?
make your content available for mining
Register your collections in the
OpenMinTeD registry and let others discover it
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 27: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/27.jpg)
What can OpenMinTeD do for you?
Are you a TDM service provider?
share and collaborate with other TDM services
Register your TDM service in the
OpenMinTeD registry and let others discover it.
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 28: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/28.jpg)
What can OpenMinTeD do for you?
Are you a text miner/research who can benefot from text-mining?
Use OpenMinTeD (when launched)
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
![Page 29: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/29.jpg)
Conclusions
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
- The ability to text-mine research literature at scale can redefine the way we do research
- OpenMinTeD is laying the groundwork (interoperability) and building the cloud infrastructure for text-mining research literature
- Building an open, transparent infrastructure that is enabling others to participate
![Page 30: TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining services, how they receive input, in what form they output their results, how they combine](https://reader034.vdocuments.us/reader034/viewer/2022042116/5e9353a631aa4219553fc532/html5/thumbnails/30.jpg)
Contact us
www.openminted.eu
3
2
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus