detecting communities in science blogs
DESCRIPTION
A structural exploration of the science blogosphere using social network analysis to look at central actors and cohesive subgroups. This was given at the 2008 4th IEEE eScience Conference in Indianapolis, IN, 12/10/2008TRANSCRIPT
![Page 1: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/1.jpg)
Detecting Communitiesin Science Blogs
Christina K. [email protected]
http://terpconnect.umd.edu/~cpikas/ScienceBlogging
![Page 2: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/2.jpg)
Problem Area
• eScience includes using electronic tools both for conducting science and for communicating about science
• There are an abundance of tools both online and offline to help scientists communicate
• Lots of scientists and members of the interested public maintain blogs (~2500?)
• Ultimate Questions:Why? With whom are scientists communicating? What are scientists communicating about? What is the value to the scientists and to science?
![Page 3: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/3.jpg)
Specific Problem Addressed
• What is the nature of the science blogosphere?– What is its shape?– Who are the central participants?– What is the connectivity?– Where are the potential information flows?
![Page 4: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/4.jpg)
Outline
•Background
•Methods–Data gathering
–Analysis
•Results
•Discussion
![Page 5: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/5.jpg)
Background: Blogs
• Defined by format– Individual posts, with permanent URLs– Comments
• Links– In content– In blogroll– In comments and trackbacks
• Community develops around single blogs and among blogs through commenting
![Page 6: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/6.jpg)
Posts
Links to StaticPages
Links and automatically generated content
http://dorigo.wordpress.com/
![Page 7: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/7.jpg)
Access to posts by search and older posts using the calendar
A list of most recent posts is automatically generated
![Page 8: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/8.jpg)
A list of categories the blogger used to describe his posts. Clicking will list all of the posts in that category.
The blogroll is a list of blogs the author reads or endorses to some extent.
Access to the older posts by month.
![Page 9: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/9.jpg)
The individual post page looks a lot like the blog home page
![Page 10: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/10.jpg)
But with Comments, which may be signed with thethe commenter’s URL
And a form to leave your own comment. Typically your e-mail will not appear on the site
![Page 11: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/11.jpg)
Background: Social Network Analysis
• Uses connections between actors to understand potential flows of informationand influence
• Uses graph theoretic methods to find– Central or prestigious actors– Cohesive subgroups including communities
![Page 12: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/12.jpg)
Methods: Sample Selection
Operational Definition of Science Blog
• Blogs maintained by scientists that deal with any aspect of being a scientist
• Blogs about scientific topics by non-scientists
Omitted
• Primarily political speech
• Ones maintained by corporations
• Non-English language
![Page 13: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/13.jpg)
Methods: Data Gathering
• Two Networks: Links and Commenters
• Link Data (Blogroll)– Used seed list developed in previous study
using directories and searches
– Snowball sampled using links from blogrolls
– Visited and copied links
• Commenter Data– Selected most central blogs from blogroll data
– Used Perl scripts to pull the commenter URLs from each of the last 10 posts
![Page 14: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/14.jpg)
Methods: Analysis
• Used social network analysis and graphing software
• Examined graph and calculated basic descriptive statistics
• Found centrality and prestige measures–Degree: the links in and out
–Betweenness: the number of shortest paths that flow through that node
–Closeness: short paths to other nodes
![Page 15: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/15.jpg)
Methods: Analysis
Located cohesive subgroups
• Link methods– Components
– LS Sets
• Clustering methods
• Community detection techniques– Newman-Girvan
– Spin Glass
![Page 16: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/16.jpg)
Results: Link Analysis (Blogroll)
• One large component
• There were 1091 nodes, 6621 arcs
• Diameter is 9
• In-degree ranges from 1 to 292, with the median in-degree of 3, and mean 6
– 10 of the top 20 blogs by in-degree are authored or co-authored by women
– 4 of the top 5 blogs by closeness are authored or co-authored by women
![Page 17: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/17.jpg)
![Page 18: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/18.jpg)
Results: Commenter
• 5 components, the largest with 911, others with 11 or fewer nodes
• 938 nodes (starting with the 46), 1152 arcs
• The largest component has a diameter of 5
![Page 19: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/19.jpg)
![Page 20: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/20.jpg)
Discussion: Links (Blogroll)
• Most of the blogs were connected in one dense component
– A result of the diffusion of blogs?
• There were a few very central blogs, and then many less central
– Typical skewed distribution
• The community of women scientists merits further study
![Page 21: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/21.jpg)
Discussion: Commenters
• Analysis easily located a notorious commenter who leaves incendiary comments on physics and chemistry blogs– High out-degree, no links in
• Traffic on the women scientist blogs is more uniform, with frequent comments that are widely distributed among the blogs– Indicates a different use
![Page 22: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/22.jpg)
Take Home Messages
• The science blogosphere is densely connected with many opportunities for influence and information diffusion
• Communities tend to form within disciplinary boundaries
• An exception is the community of women scientist bloggers who are from many different disciplines
![Page 23: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/23.jpg)
Acknowledgements
• Thanks to Dr. Jen Golbeck for supervising this work as part of an independent study
• Thanks also to– Dr. Alan Neustadtl for SNA advice– Dr. Dagobert Soergel for research advice
![Page 24: Detecting Communities in Science Blogs](https://reader035.vdocuments.us/reader035/viewer/2022062702/5549b2a9b4c905e5048b46bd/html5/thumbnails/24.jpg)
Christina K. Pikas
Doctoral Student
University of Maryland
College of Information Studies
http://terpconnect.umd.edu/~cpikas/ScienceBlogging