datamining medline for topics and trends in dental and craniofacial research william c. bartling,...
TRANSCRIPT
Datamining MEDLINE for Datamining MEDLINE for Topics and Trends in Dental and Topics and Trends in Dental and Craniofacial ResearchCraniofacial Research
William C. Bartling, D.D.S.William C. Bartling, D.D.S.
NIDCR/NLM Fellow in Dental InformaticsNIDCR/NLM Fellow in Dental Informatics
Center for Biomedical InformaticsCenter for Biomedical Informatics
University of PittsburghUniversity of Pittsburgh
Titus K. L. Schleyer, D.M.D., Ph.D.Titus K. L. Schleyer, D.M.D., Ph.D.
Director, Center for Dental InformaticsDirector, Center for Dental Informatics
University of Pittsburgh School of Dental MedicineUniversity of Pittsburgh School of Dental Medicine
OverviewOverview
Goals of projectGoals of project Retrieving the entire corpus of dental and Retrieving the entire corpus of dental and
craniofacial research literature from MEDLINE craniofacial research literature from MEDLINE Determining the characteristics of a dental research Determining the characteristics of a dental research
articlearticle Machine learning to extract articles from any body Machine learning to extract articles from any body
of literatureof literature Methods to categorize dental research literature to Methods to categorize dental research literature to
study temporal trendsstudy temporal trends SummarySummary
Goals of projectGoals of project
To use computerized methods to determine To use computerized methods to determine topics and trends in dental and craniofacial topics and trends in dental and craniofacial research since 1966. research since 1966.
Determining the structure of such research Determining the structure of such research can help to identify those research areas can help to identify those research areas emerging and those waning.emerging and those waning.
Identify research funding opportunities?Identify research funding opportunities?
Retrieving the dental literatureRetrieving the dental literature
MEDLINE chosen as the databaseMEDLINE chosen as the database MeSH tree searched manually for dental and MeSH tree searched manually for dental and
craniofacial termscraniofacial terms Many MeSH terms were found in unusual locations Many MeSH terms were found in unusual locations
in the hierarchy.in the hierarchy. Decision to keep or discard termDecision to keep or discard term Search limited to : Search limited to :
– English languageEnglish language– Journal articleJournal article– Abstract presentAbstract present
Results of searchResults of search
~450,000 English language articles in:~450,000 English language articles in:– DENTISTRYDENTISTRY– STOMATOGNATHIC SYSTEM STOMATOGNATHIC SYSTEM (not PHARYNX)(not PHARYNX)
– STOMATOGNATHIC DISEASES STOMATOGNATHIC DISEASES (not (not PHARYNGEAL DISEASES)PHARYNGEAL DISEASES)
~61,000 articles indexed with dental MeSH ~61,000 articles indexed with dental MeSH terms not in above setterms not in above set
~134,000 articles remaining after limiting to ~134,000 articles remaining after limiting to journal articles containing abstractsjournal articles containing abstracts
What is a dental research article?What is a dental research article? Currently at this phase of projectCurrently at this phase of project 1000 abstracts randomly chosen, 5 groups of 200 each1000 abstracts randomly chosen, 5 groups of 200 each 15 expert judges15 expert judges 3 judges assigned to each group3 judges assigned to each group Judges categorize each article as:Judges categorize each article as:
– Dental or craniofacial researchDental or craniofacial research– Dental or craniofacial, non-researchDental or craniofacial, non-research– Non-dentalNon-dental– Not sureNot sure
Web interface for judging- PHP with mySQLfor judging- PHP with mySQL
Differentiation of article Differentiation of article categoriescategories Acceptable reliability in each group (Acceptable reliability in each group ( > 0.70) > 0.70) Use results of each category to develop training Use results of each category to develop training
setset Identify Patient Sets (IPS) softwareIdentify Patient Sets (IPS) software
– Developed by Dr. Greg Cooper at University of Developed by Dr. Greg Cooper at University of Pittsburgh CBMIPittsburgh CBMI
– Natural language processing used to find patient Natural language processing used to find patient records of a certain type from free text documents, i.e. records of a certain type from free text documents, i.e. hospital admission recordshospital admission records
IPS creates a document vector for each IPS creates a document vector for each document or set of documentsdocument or set of documents
Document i
Word 3p3
Word npn
Word 1p1
Word 2p2
IDENTIFY PATIENT SETS IDENTIFY PATIENT SETS (IPS)(IPS) Uses machine learning technique of “text classification”Uses machine learning technique of “text classification” All articles fed into the programAll articles fed into the program
– Select fields (title, abstract, MeSH terms)Select fields (title, abstract, MeSH terms) Training set:Training set:
– 2/3 of validated “dental research” articles2/3 of validated “dental research” articles Add remaining 1/3 to original set, less the training setAdd remaining 1/3 to original set, less the training set Calculate success of retrieval using model created from Calculate success of retrieval using model created from
training settraining set Adjust IPS and iterate, or train set with more or less Adjust IPS and iterate, or train set with more or less
documents until successfuldocuments until successful
Determining trends and topics in Determining trends and topics in dental and craniofacial researchdental and craniofacial researchEntire set of dental research articles usedEntire set of dental research articles usedKnowledge visualization and bibliometric Knowledge visualization and bibliometric
methodsmethodsBased on the assumption that articles in a Based on the assumption that articles in a
given field are similar to one other (Hearst given field are similar to one other (Hearst & Pedersen, 1996)& Pedersen, 1996)
Similar articles and topics tend to cluster Similar articles and topics tend to cluster togethertogether
Bibliometric examples from Bibliometric examples from other fieldsother fields Co-word analysisCo-word analysis
– Software engineering (Coulter, Monarch, and Konda, Software engineering (Coulter, Monarch, and Konda, 1998)1998)
Co-descriptor analysisCo-descriptor analysis– Information science (McCain, 1995)Information science (McCain, 1995)
Co-author analysisCo-author analysis– Information retrieval literature (Ding et. al., 1999)Information retrieval literature (Ding et. al., 1999)
Co-citation analysisCo-citation analysis– Medical informatics literature (Morris & McCain, Medical informatics literature (Morris & McCain,
1998)1998)
Visual methods to categorize Visual methods to categorize literatureliterature Co-occurrence vectors or weightsCo-occurrence vectors or weights
– Weights based on co-occurrence of termsWeights based on co-occurrence of terms
Multidimensional scalingMultidimensional scaling– Display of points in two or three dimensionsDisplay of points in two or three dimensions
– Points closer together on matrix when articles are more Points closer together on matrix when articles are more similarsimilar
ClusteringClustering– Groups of points in close proximity to each other are Groups of points in close proximity to each other are
bounded to provide an intellectual groupingbounded to provide an intellectual grouping
Medical Informatics StructureMedical Informatics Structure
How do we cluster dental How do we cluster dental research?research?
Entire text of abstractsEntire text of abstractsMeSH terms onlyMeSH terms only
– Major headingsMajor headings– SubheadingsSubheadings– All MeSH headingsAll MeSH headings
Journal titlesJournal titlesCombinations of the aboveCombinations of the above
Once clustering is done:Once clustering is done:
Cluster dental research within certain time Cluster dental research within certain time periods (5 years)periods (5 years)
Determine quantities of articles published Determine quantities of articles published for each cluster within each time periodfor each cluster within each time period
Cluster including only journals with a given Cluster including only journals with a given impact factor threshholdimpact factor threshhold
Study changes over time of different Study changes over time of different categories of researchcategories of research
SummarySummary
A comprehensive content analysis of the A comprehensive content analysis of the dental and craniofacial research literature dental and craniofacial research literature has not been done.has not been done.
Computerized methods can help to retrieve Computerized methods can help to retrieve and categorize this literature.and categorize this literature.
Study of trends in dental research can help Study of trends in dental research can help researchers to identify relevance of current researchers to identify relevance of current studies and possibly reveal future research studies and possibly reveal future research opportunities.opportunities.
Many thanks to the following:Many thanks to the following:Amy Gregg, MLIS-Dental Reference LibrarianAmy Gregg, MLIS-Dental Reference Librarian
Falk Library for the Health SciencesFalk Library for the Health SciencesUniversity of PittsburghUniversity of Pittsburgh
Shyam Visweswaran, MD- NLM Fellow in Intelligent Shyam Visweswaran, MD- NLM Fellow in Intelligent SystemsSystems
Center for Biomedical InformaticsCenter for Biomedical InformaticsUniversity of PittsburghUniversity of Pittsburgh
All of my expert raters!All of my expert raters!
This research is supported with a training grant from the National Institute of This research is supported with a training grant from the National Institute of Dental and Craniofacial Research and the National Library of Medicine Dental and Craniofacial Research and the National Library of Medicine