datamining medline for topics and trends in dental and craniofacial research william c. bartling,...

19
Datamining MEDLINE for Datamining MEDLINE for Topics and Trends in Topics and Trends in Dental and Dental and Craniofacial Research Craniofacial Research William C. Bartling, D.D.S. William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics NIDCR/NLM Fellow in Dental Informatics Center for Biomedical Informatics Center for Biomedical Informatics University of Pittsburgh University of Pittsburgh Titus K. L. Schleyer, D.M.D., Ph.D. Titus K. L. Schleyer, D.M.D., Ph.D. Director, Center for Dental Informatics Director, Center for Dental Informatics University of Pittsburgh School of Dental University of Pittsburgh School of Dental Medicine Medicine

Upload: gervais-terence-shepherd

Post on 25-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Datamining MEDLINE for Datamining MEDLINE for Topics and Trends in Dental and Topics and Trends in Dental and Craniofacial ResearchCraniofacial Research

William C. Bartling, D.D.S.William C. Bartling, D.D.S.

NIDCR/NLM Fellow in Dental InformaticsNIDCR/NLM Fellow in Dental Informatics

Center for Biomedical InformaticsCenter for Biomedical Informatics

University of PittsburghUniversity of Pittsburgh

Titus K. L. Schleyer, D.M.D., Ph.D.Titus K. L. Schleyer, D.M.D., Ph.D.

Director, Center for Dental InformaticsDirector, Center for Dental Informatics

University of Pittsburgh School of Dental MedicineUniversity of Pittsburgh School of Dental Medicine

Page 2: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

OverviewOverview

Goals of projectGoals of project Retrieving the entire corpus of dental and Retrieving the entire corpus of dental and

craniofacial research literature from MEDLINE craniofacial research literature from MEDLINE Determining the characteristics of a dental research Determining the characteristics of a dental research

articlearticle Machine learning to extract articles from any body Machine learning to extract articles from any body

of literatureof literature Methods to categorize dental research literature to Methods to categorize dental research literature to

study temporal trendsstudy temporal trends SummarySummary

Page 3: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Goals of projectGoals of project

To use computerized methods to determine To use computerized methods to determine topics and trends in dental and craniofacial topics and trends in dental and craniofacial research since 1966. research since 1966.

Determining the structure of such research Determining the structure of such research can help to identify those research areas can help to identify those research areas emerging and those waning.emerging and those waning.

Identify research funding opportunities?Identify research funding opportunities?

Page 4: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Retrieving the dental literatureRetrieving the dental literature

MEDLINE chosen as the databaseMEDLINE chosen as the database MeSH tree searched manually for dental and MeSH tree searched manually for dental and

craniofacial termscraniofacial terms Many MeSH terms were found in unusual locations Many MeSH terms were found in unusual locations

in the hierarchy.in the hierarchy. Decision to keep or discard termDecision to keep or discard term Search limited to : Search limited to :

– English languageEnglish language– Journal articleJournal article– Abstract presentAbstract present

Page 5: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Results of searchResults of search

~450,000 English language articles in:~450,000 English language articles in:– DENTISTRYDENTISTRY– STOMATOGNATHIC SYSTEM STOMATOGNATHIC SYSTEM (not PHARYNX)(not PHARYNX)

– STOMATOGNATHIC DISEASES STOMATOGNATHIC DISEASES (not (not PHARYNGEAL DISEASES)PHARYNGEAL DISEASES)

~61,000 articles indexed with dental MeSH ~61,000 articles indexed with dental MeSH terms not in above setterms not in above set

~134,000 articles remaining after limiting to ~134,000 articles remaining after limiting to journal articles containing abstractsjournal articles containing abstracts

Page 6: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

What is a dental research article?What is a dental research article? Currently at this phase of projectCurrently at this phase of project 1000 abstracts randomly chosen, 5 groups of 200 each1000 abstracts randomly chosen, 5 groups of 200 each 15 expert judges15 expert judges 3 judges assigned to each group3 judges assigned to each group Judges categorize each article as:Judges categorize each article as:

– Dental or craniofacial researchDental or craniofacial research– Dental or craniofacial, non-researchDental or craniofacial, non-research– Non-dentalNon-dental– Not sureNot sure

Web interface for judging- PHP with mySQLfor judging- PHP with mySQL

Page 7: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center
Page 8: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center
Page 9: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Differentiation of article Differentiation of article categoriescategories Acceptable reliability in each group (Acceptable reliability in each group ( > 0.70) > 0.70) Use results of each category to develop training Use results of each category to develop training

setset Identify Patient Sets (IPS) softwareIdentify Patient Sets (IPS) software

– Developed by Dr. Greg Cooper at University of Developed by Dr. Greg Cooper at University of Pittsburgh CBMIPittsburgh CBMI

– Natural language processing used to find patient Natural language processing used to find patient records of a certain type from free text documents, i.e. records of a certain type from free text documents, i.e. hospital admission recordshospital admission records

Page 10: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

IPS creates a document vector for each IPS creates a document vector for each document or set of documentsdocument or set of documents

Document i

Word 3p3

Word npn

Word 1p1

Word 2p2

Page 11: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

IDENTIFY PATIENT SETS IDENTIFY PATIENT SETS (IPS)(IPS) Uses machine learning technique of “text classification”Uses machine learning technique of “text classification” All articles fed into the programAll articles fed into the program

– Select fields (title, abstract, MeSH terms)Select fields (title, abstract, MeSH terms) Training set:Training set:

– 2/3 of validated “dental research” articles2/3 of validated “dental research” articles Add remaining 1/3 to original set, less the training setAdd remaining 1/3 to original set, less the training set Calculate success of retrieval using model created from Calculate success of retrieval using model created from

training settraining set Adjust IPS and iterate, or train set with more or less Adjust IPS and iterate, or train set with more or less

documents until successfuldocuments until successful

Page 12: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Determining trends and topics in Determining trends and topics in dental and craniofacial researchdental and craniofacial researchEntire set of dental research articles usedEntire set of dental research articles usedKnowledge visualization and bibliometric Knowledge visualization and bibliometric

methodsmethodsBased on the assumption that articles in a Based on the assumption that articles in a

given field are similar to one other (Hearst given field are similar to one other (Hearst & Pedersen, 1996)& Pedersen, 1996)

Similar articles and topics tend to cluster Similar articles and topics tend to cluster togethertogether

Page 13: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Bibliometric examples from Bibliometric examples from other fieldsother fields Co-word analysisCo-word analysis

– Software engineering (Coulter, Monarch, and Konda, Software engineering (Coulter, Monarch, and Konda, 1998)1998)

Co-descriptor analysisCo-descriptor analysis– Information science (McCain, 1995)Information science (McCain, 1995)

Co-author analysisCo-author analysis– Information retrieval literature (Ding et. al., 1999)Information retrieval literature (Ding et. al., 1999)

Co-citation analysisCo-citation analysis– Medical informatics literature (Morris & McCain, Medical informatics literature (Morris & McCain,

1998)1998)

Page 14: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Visual methods to categorize Visual methods to categorize literatureliterature Co-occurrence vectors or weightsCo-occurrence vectors or weights

– Weights based on co-occurrence of termsWeights based on co-occurrence of terms

Multidimensional scalingMultidimensional scaling– Display of points in two or three dimensionsDisplay of points in two or three dimensions

– Points closer together on matrix when articles are more Points closer together on matrix when articles are more similarsimilar

ClusteringClustering– Groups of points in close proximity to each other are Groups of points in close proximity to each other are

bounded to provide an intellectual groupingbounded to provide an intellectual grouping

Page 15: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Medical Informatics StructureMedical Informatics Structure

Page 16: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

How do we cluster dental How do we cluster dental research?research?

Entire text of abstractsEntire text of abstractsMeSH terms onlyMeSH terms only

– Major headingsMajor headings– SubheadingsSubheadings– All MeSH headingsAll MeSH headings

Journal titlesJournal titlesCombinations of the aboveCombinations of the above

Page 17: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Once clustering is done:Once clustering is done:

Cluster dental research within certain time Cluster dental research within certain time periods (5 years)periods (5 years)

Determine quantities of articles published Determine quantities of articles published for each cluster within each time periodfor each cluster within each time period

Cluster including only journals with a given Cluster including only journals with a given impact factor threshholdimpact factor threshhold

Study changes over time of different Study changes over time of different categories of researchcategories of research

Page 18: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

SummarySummary

A comprehensive content analysis of the A comprehensive content analysis of the dental and craniofacial research literature dental and craniofacial research literature has not been done.has not been done.

Computerized methods can help to retrieve Computerized methods can help to retrieve and categorize this literature.and categorize this literature.

Study of trends in dental research can help Study of trends in dental research can help researchers to identify relevance of current researchers to identify relevance of current studies and possibly reveal future research studies and possibly reveal future research opportunities.opportunities.

Page 19: Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center

Many thanks to the following:Many thanks to the following:Amy Gregg, MLIS-Dental Reference LibrarianAmy Gregg, MLIS-Dental Reference Librarian

Falk Library for the Health SciencesFalk Library for the Health SciencesUniversity of PittsburghUniversity of Pittsburgh

Shyam Visweswaran, MD- NLM Fellow in Intelligent Shyam Visweswaran, MD- NLM Fellow in Intelligent SystemsSystems

Center for Biomedical InformaticsCenter for Biomedical InformaticsUniversity of PittsburghUniversity of Pittsburgh

All of my expert raters!All of my expert raters!

This research is supported with a training grant from the National Institute of This research is supported with a training grant from the National Institute of Dental and Craniofacial Research and the National Library of Medicine Dental and Craniofacial Research and the National Library of Medicine