supporting the research process the nactem text mining service william black informatics, manchester
TRANSCRIPT
![Page 1: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/1.jpg)
Supporting the Research Process
The NaCTeM Text Mining Service
William BlackInformatics, Manchester
![Page 2: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/2.jpg)
Contents
• What is Text Mining/What is NaCTeM?• Approaches/Methods• Text Mining Tasks
– IE, Argumentative Zoning, Terminology Discovery
• End-user services for researchers• NaCTeM activities with social scientists
![Page 3: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/3.jpg)
What is Text Mining?
• Knowledge discovery from textual sources– Primary sources
• Documents, News, Web
– Scientific Literatures
• Using NLP, Ontologies, IR on a large scale
![Page 4: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/4.jpg)
What is the Text Mining Centre? http://www.nactem.ac.uk
• Established in 2004 in response to a JISC/EPSRC/BBSRC initiative
• A Manchester and Liverpool collaboration– Formerly also UMIST, Salford – Accommodated in the Manchester Interdisciplinary
Biocentre (MIB)
• Develop a variety of national services based on the application to biological sciences, with deployment from Autumn 2006
• Initially in biological sciences, with a second focus on social science during 2006-7
![Page 5: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/5.jpg)
Text Mining - Approaches
• Distinguished from IR by semantic analysis leading to extraction of entities, facts, events, not mere documents.
• Distinguished from the Semantic Web by use of automated analysis based on robust natural language processing.
• A wide variety of methods and analyses ranging from domain-independent to domain-specific.
![Page 6: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/6.jpg)
Methods of Text Mining
• Pipelined processes performing increasing levels of analysis common to all approaches– Document structure analysis, tokenization,
tagging, phrasal chunking, named entity recognition/classification, fact and event extraction.
– Indexed to provide conceptual IR services
![Page 7: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/7.jpg)
Sample text mining sub-tasks
• Named entity recognition and classification.• Terminology discovery and ontology
maintenance• Information extraction (IE) in limited domains -
for intelligence analysts and scientists• Summarization - informative, tailored,
multilingual, multi-document• Open-domain IE and QA• Association mining over databases of extracted
facts.
![Page 8: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/8.jpg)
Illustrations of IE on successive full-page screenshots
• Named entity phrase bracketing
• Named entity extraction
• Fact extraction and slot filling
• An application to a research literature
![Page 9: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/9.jpg)
![Page 10: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/10.jpg)
![Page 11: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/11.jpg)
![Page 12: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/12.jpg)
![Page 13: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/13.jpg)
Terminology Discovery - Ananiadou, NaCTeM
• A form of unsupervised learning, whose only required resource is a general purpose PoS tagger.
• Can be applied to text in any language, domain or genre to reveal terminology on the basis of phrasehood and distribution.
• TerMine will be among the first deployed NaCTeM tools.
![Page 14: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/14.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 15: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/15.jpg)
Argumentative ZoningSimone Teufel, Cambridge Computing Lab
• BKG: General scientific background (yellow)• OTH: Neutral descr’s of others’ work (orange)• OWN: Neutral descr’s of own, new work (blue)• AIM: Stmts of particular aim of current paper (pink)• TXT: Stmts of textual org. of current paper (red)• CTR: Contrastive or comparative stmts incl. explicit
mention of weaknesses of other work (green)• BAS: Stmts that own work is based on other work
(purple)
![Page 16: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/16.jpg)
Argumentative Zoning Example
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 17: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/17.jpg)
End-user services based on full NLP and conceptual indexing
• Two conceptual IR services based on prior full-scale NLP analysis of Medline at Tsujii Lab, University of Tokyo
– InfoPubMed: A complex tool supporting a research
workflow for literature review and knowledge
discovery/hypothesis generation
– Medie: A simple IR interface as intuitive as
Google, but returning fact-bearing sentences,
which are more than document surrogates.
![Page 18: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/18.jpg)
Gene/gene productsyou are interested in
![Page 19: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/19.jpg)
Fields
By clicking this button,you can restrict search fields
By clicking this button,you can restrict species.
GeneBoxes
![Page 20: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/20.jpg)
Drag this GeneBox to the Interaction Viewer
![Page 21: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/21.jpg)
![Page 22: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/22.jpg)
Drag this InteractionBoxto ContentViewer
![Page 23: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/23.jpg)
Sentence Box
Property which means the co-occurrenceIn the sentence is a direct evidence of interaction
Property which means the co-occurrenceIn the sentence is a mere co-occurrence
![Page 24: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/24.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 25: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/25.jpg)
Possible end-user service based on AZ
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
More than Google’s PageRank™, because the links are typed.
![Page 26: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/26.jpg)
NaCTeM and Social Science/Humanities
• In Year 3 (from Oct 2006), develop pilot service aimed at social science.
• Local links with NCESS• Preparatory invited workshop held in May,
2006.• Text-mining and Digitised C19th Research
Resources Workshop with British Library
![Page 27: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/27.jpg)
Workshop on Text Mining in Social SciencesPresentations available at NaCTeM Web page
– Bridging qualitative and quantitative methods for social sciences using text mining techniques (Sophia Ananiadou)
– Text Mining Activities at the National Centre (Sophia Ananiadou, Jun-ich Tsujii, Paul Watry)
– Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD (Louise Corti)
– Author Identification (Katerina T. Frantzi) – Sentiment Analysis and Financial Grids (Lee Gillam) – Concordances and semi-automatic coding in qualitative analysis:
possibilities and barriers (Graham R. Gibbs) – Bridging quantitative and qualitative methods for social sciences using
text mining techniques (Tetsuya Nasukawa) – Computer-Assisted Content Analysis (Andrew Wilson)
![Page 28: Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester](https://reader035.vdocuments.us/reader035/viewer/2022062511/5515fd6e550346d46f8b599d/html5/thumbnails/28.jpg)
NaCTeM status
• NaCTeM is almost at the end of its tool development phase
• Moving to deployment of services this Autumn
• Will include domain-independent terminology management from the outset
• Other applications of interest to social science researchers will be appearing approx. 1 year from now.