understanding health information behaviors in social q&a: text mining of health questions in...
TRANSCRIPT
![Page 1: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/1.jpg)
Understanding Health Information Behaviors in Social Q&A:
Text Mining of Health Questions in Yahoo! Answers
Florida State University School of Library & Information Studies
College of Communication and InformationSanghee Oh
MinSook Park
[email protected]@my.fsu.edu
![Page 2: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/2.jpg)
Hello ALISE, I am David!
![Page 3: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/3.jpg)
Social Media and Health
• 2013 Pew Research Center Report (Fox & Duggan, 2013)– 16% of those who seek health information online look for others who
have similar health concerns.– 26% of the Internet users have read the personal experiences of others
pertaining to their health conditions.– 30% of them referred to online reviews or rankings of health care
services or treatments during the past year.
• How America Searches: Health and Wellness (iCrossing, 2008)– Social media is the third most popular online tool people use to locate
health information (34%), following general search engines (67%) and health portals (46%).
– Wikipedia is the most frequently used social media service for health information (21%), followed by online forums (15%), social networks (6%), video-sharing sites (5%), and blogs (4%).
![Page 4: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/4.jpg)
Social Q&A
• Web-based service allowing people to ask and answer one another in many different topic areas
• Free and easy to access and use
• People can benefit from the varying levels of knowledge, expertise, and experiences.
• People can elaborate on their information needs in questions or describe sources of information in answers with their own words, explaining their diseases, medical histories, conditions, or resources with as much (or as little) detail as they wish.
• Examples– Yahoo! Answers– WikiAnswers– AnswerBag
![Page 5: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/5.jpg)
![Page 6: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/6.jpg)
Research Questions
What is the disease specific information (e.g., prevention, risk factors, symptoms, diagnosis, treatments) people would most likely discuss in health questions?
What are the personal experiences, expertise, and resources people share in health questions?
What are the social and emotional supports people would like to receive or share in health questions?
How have the findings from the research questions above been evolved by time, from 2009 to 2012?
![Page 7: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/7.jpg)
Method
• Test bed: Yahoo! Answers – About 20 health-topic categories are available (e.g., Cancer,
Women’s Health, Dental, Diabetes, Sexually Transmitted Diseases).
![Page 8: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/8.jpg)
Data Collection
• Collecting up to 5,000 health-related questions and corresponding answers per day, using the Yahoo! Answers API (Application Programming Interface).
• Collecting data about questions, answers, best answers, resources (references), ratings, timestamps, user nick names, etc.
• Approx. 1 million questions and 5 million answers are available for the analysis.
• # of health-related questions posted in 2012: 468,655• # of corresponding health-related answers to the questions: 1,267,554• # of STD questions posted between 2009 to 2012: 69,363
![Page 9: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/9.jpg)
Text Mining
• To observe health information needs presented in a large and complex collection of health questions from a social Q&A service, Yahoo Answers
• Interpretation of the results from text mining could be mostly based on terms without considering the contexts. Thus, content analysis of the questions was carried out prior to text mining in order to capture the contexts of the information behaviors of the questioners.
Content Analysis(1,118 questions)
Information Framework of Health Questions
Development
Text Mining(69,363 questions)
![Page 10: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/10.jpg)
Text-Mining Software
Dataset: 69,363 health questions about STDs posted from 2009 to 2012.
• IBM SPSS Modeler Premium: Text Analytics
Text mining software is designed to analyze unstructured data, extracting words and concepts from texts and identifying the relationships among them using predictive models.
Extracting words and concepts from texts, using MeSH (Medical Subject Headings) and a customized dictionary for STDs
Counting the frequency of the concepts
Extracted concepts were grouped into the categories of the information framework developed by content analysis in a previous study.
![Page 11: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/11.jpg)
Data Analysis Process
--------------------------------------------------------
Question --------------------------------------------------------
Question --------------------------------------------------------
Health Question
--------------------------------------------------------
Extract concepts and calculate frequencies of questions associated with each concept
STDs: 18,229herpes: 15,432HIV: 11,739doctor: 10,168test: 7,543symptoms: 7,259AIDS: 5,669…
Generate concept maps and identify the relationships/similarity of the terms in health questions
Research Database
Yahoo! Answers
Text Preparation
Concept Extraction
Data Collection
![Page 12: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/12.jpg)
STD Concept Extraction (5,000 concepts)Rank Major Concepts No. of Questions
1 STDs 182292 Sex 179453 Herpes 154324 help 150295 HIV 117396 Doctor 101687 Vagina 78668 Boyfriend 78009 Condom 7737
10 Test 754311 Symptoms 725912 Guy 705213 Bumps 636114 Question 621115 Girl 601416 AIDS 566917 Feel 563918 Need 543519 Day 542820 Penis 5382
Table 1. The top 20 most popular concepts in STD questions.
![Page 13: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/13.jpg)
What is the disease specific information people would most likely discuss in health questions and answers?
Disease; 54959
Risk Factors; 43818
Symptom; 35016
Relationship; 34792
Body part/Body System; 33342
Treatments; 21616
Prevention/Causes/Transmis-
sion; 16665
Daily lives; 15936
Test; 15361 Emotions; 10340 Diagnosis; 967
Number of Questions in Each Category
![Page 14: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/14.jpg)
Types of STDs
Rank Major Concepts No. of Questions
1 STDs 182292 Herpes 154323 HIV 117394 AIDS 56695 HPV 41736 Chlamydia 45487 Genital warts 27678 Yeast infection 24489 Syphilis 903
10 Hepatitis 58311 Bacterial Vaginosis 52912 Trichomoniasis 31213 Gonorrhea 15
Table 2. The top 13 most frequently discussed STD diseases
STDs;
18229
Herpes; 15432
HIV; 11739
AIDS;
5669
HPV; 4173
Chlamydia; 4548
Geni-tal
warts; 2767
Yeast infection; 2448Syphilis; 903Hepatitis; 583
Trichomoniasis; 312
![Page 15: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/15.jpg)
Concept Map: Herpes(Maximum concept on map: 30)co
![Page 16: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/16.jpg)
Concept Map: Herpes(Maximum concept on map: 30)
![Page 17: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/17.jpg)
Concept Map: Herpes(Maximum concept on map: 30)
![Page 18: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/18.jpg)
What are the personal experiences, expertise, and resources people share in health questions?
Table 3. The top 20 most popular life issues
virginity; 3875
life; 3145
pregnancy; 2584baby; 976
kids; 975
situation; 872
health; 871
birth control; 766
health in-surance; 656
money; 517
effect; 471
planned parenthood; 441
lead; 359pay; 321
cost; 313lie; 307infertility; 271
Daily LivesConcepts No of Questions
1 virginity 38752 life 31453 pregnancy 25844 baby 9765 kids 9756 situation 8727 health 8718 birth control 7669 health insurance 656
10 money 51711 effect 471
12 planned parenthood 441
13 paperwork 41614 lead 35915 pay 32116 cost 31317 lie 30718 future 28619 infertility 27120 marriage 220
![Page 19: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/19.jpg)
What are the social and emotional supports people would like to receive or share in health questions?
freaking; 1213
worried; 980
i don't know; 718
love; 640
fear; 334trust; 329anxiety; 325
mistake; 290
hate; 287
doubt; 278
nasty; 243
concern; 235
embrassing; 169
panic; 149ease; 144regret; 138
fault; 90relief; 86peace; 75
Number of Questions
Concepts No of Questions1 freaking 12132 worried 9803 I don't know 7184 love 6405 Fear 3346 trust 3297 anxiety 3258 mistake 2909 hate 287
10 doubt 27811 nasty 24312 concern 23513 embarrassing 16914 panic 14915 ease 14416 regret 13817 hypochondria 10718 fault 9019 relief 8620 pleasure 86
Table 4. The top 20 most frequently discussed Emotions
![Page 20: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/20.jpg)
How have the findings from the research questions above been evolved by time, from 2009 to 2012?
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10Jul-1
0
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10
Jan-11
Feb-11
Mar-11
Apr-11
May-11
Jun-11Jul-1
1
Aug-11
Sep-11
Oct-11
Nov-11
Dec-11
Jan-12
Feb-12
Mar-12
Apr-12
May-12
Jun-12Jul-1
2
Aug-12
Sep-12
Oct-12
Nov-12
Dec-12
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
STDs hiv herpes Human Papillomavirus (HPV)chlamydia genital warts yeast infection gonorrheaAIDS Bacterial Vaginosis (BV) Hepatitis SyphilisTrichomoniasis
![Page 21: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/21.jpg)
Discussion / Implication
• Text mining has been an effective method with which to understand and identify relationships among concepts in a large dataset.
• Text mining will continue to identify the information people seek and share in health questions and answers in social Q&A.
• Findings could be beneficial for health information professionals to better understand the health information needs and behaviors of people in real life.
• Findings could inform the design, evaluation, or improvement of services and systems to help guide people in making informed health care decisions.
• The proposed method is applicable to analyzing questions and answers in other topic areas as well as in examining information shared in other types of social media (e.g., wall messages in social networking sites, tweets, blogs, wikis).
![Page 22: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/22.jpg)
References
Liddy (2000) http://gate.ac.uk/sale/talks/text-mining-course-sslst2011/slides/module1-intro.pdf
M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 1999.
E. Riloff and R. Jones, “Learning Dictionaries for Information Extraction Using Multi-level Boot-strapping,” in the Proceedings of AAAI-99, 1999.
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents using EM,” in Machine Learning, 2000.
M. Grobelnik, D. Mladenic, and N. Milic-Frayling, “Text Mining as Integration of Several Related Research Areas: Report on KDD’2000 Workshop on Text Mining,” 2000.
![Page 23: Understanding Health Information Behaviors in Social Q&A: Text Mining of Health Questions in Yahoo! Answers Florida State University School of Library](https://reader031.vdocuments.us/reader031/viewer/2022022404/56649dbc5503460f94aade38/html5/thumbnails/23.jpg)
Thank you!
Questions & Comments?
• Sanghee Oh, assistant professor at Florida State University (FSU)
• Contact Informationo Office: 1-850-645-2493o Email: [email protected] Personal Website: http://shoh.cci.fsu.eduo Research Website: http://socialqa.cci.fsu.edu