![Page 1: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/1.jpg)
www.tugraz.at n
W I S S E N n T E C H N I K n L E I D E N S C H A F T
u www.tugraz.at
Science 2.0 VU E-Science, E-Infrastructures, Content/Data Mining
WS 2014/15
Elisabeth Lex KTI, TU Graz
![Page 2: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/2.jpg)
www.tugraz.at n
Agenda
• Repetition from last time: altmetrics • E-Science • E-Infrastructures • Content/Data Mining
2
![Page 3: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/3.jpg)
www.tugraz.at n
Altmetrics (repetition)
„Altmetric is the creation and study of new metrics based on the Social Web for analyzing and informing
scholarship“ - Altmetrics Manifesto, http://altmetrics.org/about
• Aggregated from many sources (e.g. Twitter, Mendeley, github, slideshare,...)
• Article Level Metrics (ALM) • multidimensional suite of transparent and established metrics at
article level
3
![Page 4: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/4.jpg)
www.tugraz.at n
Examples for Altmetrics sources (repetition) • Usage
• Views, downloads,.. • Captures
• Bookmarks, readers,.. • Mentions
• Blog posts, news stories, Wikipedia articles, comments, reviews
• Social Media • Tweets, Google+, Facebook likes, shares, ratings
• Citations • Web of Science, Scopus, Google Scholar,...
4
![Page 5: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/5.jpg)
www.tugraz.at n
Examples: Altmetric.com
5 Source: http://www.altmetric.com/details.php?domain=www.altmetric.com&citation_id=843656
![Page 6: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/6.jpg)
www.tugraz.at n
Lessons learned (repetition)
• Alternative ways to assess impact of various scientific outputs
• No common understanding of altmetrics yet • What do they really express? • Are they useful and for which part of the research
process? • Not necessarily „better“ metrics
• E.g. Gamification • Can help to get an overview of a research field
• Visualizations based on altmetrics
6
![Page 7: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/7.jpg)
www.tugraz.at n
e-Science, e-Infrastructures, Content Mining
7
![Page 8: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/8.jpg)
www.tugraz.at n
Modern Science: What has changed?
• 150 years later: Searching for new particles like Higgs boson with the Large Hadron Collider
• Built in collaboration with over 10,000 scientists and engineers from over 100 countries, hundreds of universities and laboratories. In a tunnel of 27 km in circumference,175 m deep, near Geneva
8
![Page 9: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/9.jpg)
www.tugraz.at n
Motivation
• Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and engineering) generate large and complex datasets (Big Data)
• require more advanced database and architectural support
• „New kind of research methodology“ has emerged (fourth paradigm of scientific exploration (Hey, 2007)
• based on statistical exploration of big amounts of data
à Led to e-science 9 http://www.ksi.mff.cuni.cz/astropara/
![Page 10: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/10.jpg)
www.tugraz.at n
e-Science
• Large scale science (since 1999) • Data-driven discovery • Focus on computationally intensive science and how
to tackle it using highly distributed environments • Powerful computers: Supercomputers, High
Performance Computing (HPC), Grid,… • Distributed Computing • Powerful research infrastructures – “e-infrastructures”,
grids, clouds
10 http://www.anandtech.com/show/6421/inside-the-titan-supercomputer-299k-amd-x86-cores-and-186k-nvidia-gpu-cores/3
![Page 11: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/11.jpg)
www.tugraz.at n
Supercomputers
11 http://www.top500.org/lists/2014/06/ http://www.wikihow.com/Build-a-Supercomputer
• large, expensive systems, usually housed in a single room, in which multiple processors are connected by fast local network
• Suited for highly complex, real-time applications and simulation
Pros: data can move between processors rapidly àall processors can work together on same tasks Cons: expensive to build and maintain. Do not scale well, e.g. adding more processors is challenging
![Page 12: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/12.jpg)
www.tugraz.at n
Distributed Computing
• systems in which processors are not necessarily located in close proximity to one another—and can even be housed on different continents—but which are connected via the Internet or other networks
12
• Pros: relative to supercomputers much less expensive.
• Cons: less speed achieved than with supercomputers
![Page 13: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/13.jpg)
www.tugraz.at n
Example: Hadoop
• Ecosystem of tools for processing big data
• Simple computational model • two-stage method for processing large data amounts • design an algorithm for operating on one chunk of the
data in two stages (a Map and a Reduce stage), MapReduce automatically distributes that algorithm to cluster à hides complexity in framework
13 http://hadoop.apache.org http://architects.dzone.com/articles/how-hadoop-mapreduce-works
![Page 14: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/14.jpg)
www.tugraz.at n
Hadoop in eScience: Example: Astronomical Image Processing
• Large telescopes survey sky over a prolonged period of time.
• Large Synoptic Survey Telescope LSST - under construction - will capture 1/2 of sky over 10 years - 30TB of data every night - ~60PBs in 10 years
• Astronomers pick out faint objects for study by capturing multiple images of same area and by combining them – „coaddition“
• Challenge: how to organize and process all the resulting data.
14 http://www.lsst.org/lsst/
![Page 15: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/15.jpg)
www.tugraz.at n
Using Hadoop to help with image coaddition
15 http://escience.washington.edu/get-help-now/astronomical-image-processing-hadoop
![Page 16: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/16.jpg)
www.tugraz.at n
Example: Big Data in Science - European Exascale Projects
16 http://exascale-projects.eu
Exascale computing: computers capable of at least one exaflops (1018 floating point operations per second) à Not yet achieved, currently 1015
![Page 17: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/17.jpg)
www.tugraz.at n
Virtual Science Environments
• Not only HPC but also sharing of knowledge and data is becoming a requirement for scientific discovery
• providing useful mechanisms to facilitate this sharing • Preserve and organize research data
à Virtual Science Environments: „virtual environments in which researchers work together through ubiquitous, trusted and easy access to services for scientific data, computing and networking, enabled by e-Infrastructures“
17
![Page 18: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/18.jpg)
www.tugraz.at n
Defining e-Infrastructures
European e- Infrastructure Reflection group (e-IRG):
‘The term e-Infrastructure refers to this new research environment in which all researchers—whether working in the context of their home institutions or in national or multinational scientific initiatives—have shared access
to unique or distributed scientific facilities (including data, instruments, computing and communications),
regardless of their type and location in the world.’
18 http://www.e-irg.eu/about-e-irg.html
![Page 19: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/19.jpg)
www.tugraz.at n
e-Infrastructures - Goals
• Opening access to knowledge through reliable, distributed and participatory data e-infrastructures
• Cost effective infrastructures for preservation and curation for re-use of data
• Persistent availability of information and linking people and data through flexible and robust digital identifiers
• Interoperability for consistency of approaches on global data exchange (e.g. standards)
• Enabling trust through authentication and authorisation mechanisms
19 http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/framework-for-action-in-h2020_en.pdf
![Page 20: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/20.jpg)
www.tugraz.at n
Example: e-Infrastructure OpenAIRE
• The European Open Access Data Infrastructure for Scholarly and Scientific Communication
• Functionality: • Harvesting and storing of information about
publications from various repos (OAI-PMH) • Enables searching for publications and related
infos (e.g. funding,..) • Provides list of OA repos that can be used to store
publications • Orphan repo
• Shows statistics of stored data 20 https://www.openaire.eu
![Page 21: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/21.jpg)
www.tugraz.at n
OpenAIRE - Applications
21
![Page 22: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/22.jpg)
www.tugraz.at n
Example: e-Infrastructures Austria 1/2
22 http://www.e-infrastructures.at
![Page 23: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/23.jpg)
www.tugraz.at n
Example: e-Infrastructures Austria 2/2
23
![Page 24: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/24.jpg)
www.tugraz.at n
e-Science, e-Infrastructures, Content Mining
24
![Page 25: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/25.jpg)
www.tugraz.at n
• Make science discoverable • Extract facts for research • Build reusable objects • Aggregate • Create new businesses • Check for errors => better science
Content Mining - Motivation
![Page 26: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/26.jpg)
www.tugraz.at n
Content Mining
à to extract, process and republish content manually or by machine
• Content: can be text, numerical data, static images, videos, audio, metadata, bibliographic data or any digital information, and/or a combination of them all à all types of information.
• Mining: large-scale of information extraction from target content
26 http://access.okfn.org/2014/03/27/what-is-content-mining/
![Page 27: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/27.jpg)
www.tugraz.at n
Data Mining
27
Content Mining vs. Data Mining à content mining is more generic
![Page 28: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/28.jpg)
www.tugraz.at n
• Secondary publishers create walled gardens • E.g. ResearchGate portal
• Publishers’ contracts ban content-mining. • Publishers may cut off universities who mine • Publishers lobby governments to require “licences
for content mining” UK à “the right to read is the right to mine”
Content Mining Problems
http://blogs.ch.cam.ac.uk/pmr/2013/10/02/text-and-data-mining-fighting-for-our-digital-future-peter-murray-rust-is-the-problem/
![Page 29: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/29.jpg)
www.tugraz.at n
Example: ContentMine
29 http://contentmine.org
Idea: • facts cannot
be copyrighted • Billion of facts
in copyright-protected research articles
à Make them publicly accessible!
![Page 30: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/30.jpg)
www.tugraz.at n
Content to mine in scientific paper
30
date
researcher resouce
![Page 31: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/31.jpg)
www.tugraz.at n
1. Crawl scientific literature 2. Scrape each scientific article 3. Extract facts 4. Index 5. Republish (WikiData)
Machine Extraction of scientific facts
https://github.com/ContentMine
![Page 32: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/32.jpg)
www.tugraz.at n
Example: retrieve metadata for specific article
26/11/14 32
![Page 33: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/33.jpg)
www.tugraz.at n
Example: Measuring quality of Wikipedia
33 Elisabeth Lex, Michael Voelske, Marcelo Errecalde, Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein, and Michael Granitzer. 2012. Measuring the quality of web content using factual information. In Proceedings of WebQuality '12 at WWW‘12
(a) Unbalanced (b) Balanced
Figure 1: Histograms of Wikipedia corpora for unbalanced dataset and balanced dataset.
is the word count of t, and t is a Wikipedia article. Thesame holds for “Factual-density/sentence-count”.
The word count measure outperforms the factual densitymeasure normalized to sentence count as well as the wordcount on the unbalanced corpus. Apparently, word count isa strong feature on the unbalanced corpus.
We then evaluated the factual density measure on the bal-anced corpus where both featured/good and non-featuredarticles are more similar in respect to document length.The results for this experiment are shown in Figure 2(b)as precision-recall curves. On the balanced corpus, factualdensity normalized to sentence count as well as word countperforms much better than on the unbalanced corpus, whileword count, as expected, performs worse. There is not muchdi↵erence between the normalization to word or sentencecount since here, the number of words per document has asmaller influence on the result.
We also analyzed the distributions of featured/good andnon-featured articles if factual density is used as measure,as depicted in Figure 3. We found that the distributionof the featured/good articles is clearly separated from thedistribution of the non-featured articles, with peaks at twodi↵erent factual density values (0.06 and 0.03 respectively).This finding is in contrast to the fact that the distributionsof featured/good articles and non-featured articles have ahigh degree of overlap if word count is used, as shown inFigure 1(b). Consequently, on the balanced corpus, factualdensity clearly outperforms our baseline word count.
In a related experiment, we investigated the relational in-formation contained in the binary relationships ReVerb ex-tracts from sentences. We used the relations, i.e. only thepredicates from the extracted triples as a vocabulary to rep-resent the documents. We then tested the discriminativepower of these features by training a classifier to solve the bi-nary classification problem of distinguishing featured/goodfrom non-featured articles. The results reported in Table 1were obtained using the WEKA6 implementation of a NaiveBayes Classifier in combination with feature selection basedon Information Gain (IG). From 40 000 relations, we selected
6http://www.cs.waikato.ac.nz/~ml/weka/
Figure 3: Distribution of articles by factual density.
the 10% best features in terms of IG. We achieved similarresults for both corpora.
Table 1: Classification results using relational fea-tures on both corpora.
Unbalanced Balanced
Measure Value [%] Value [%]
Accuracy 84.01 87.14F-Measure 84 86.7Precision 84 89.2Recall 84 87.1
Apparently, relational features are more robust when thedocument length varies. However, we need to investigatethis in more detail.
![Page 34: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/34.jpg)
www.tugraz.at n
Possible questions for content mining
• Find references to papers by a given author. This is metadata and therefore factual. It is usually trivial to extract references and authors. More difficult, of course to disambiguate.
• Find papers about Science 2.0 in German. Highly tractable. Typical approach would be to find the 50 commonest words (e.g. "ein", "das",...) in a paper and show the frequency is very different from English ("one", "the" ...)
34
![Page 35: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/35.jpg)
www.tugraz.at n
Example: Facilitate exploratory search in social bookmarking sites
• by topic of interest • Setting: Social bookmarking dataset, URLs
described by tags - dataset size: 61 665 posts (~430 000 triples)
§ Research Questions:
§ What groups of interests exist? § Are they somehow related? § How do they evolve over time?
![Page 36: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/36.jpg)
www.tugraz.at n
Approach
![Page 37: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/37.jpg)
www.tugraz.at n
![Page 38: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/38.jpg)
www.tugraz.at n
Take away message
• e-Science: data-driven, large scale science • Supercomputers and distributed computing
• Virtual research environments • e-Infrastructures
• Mining content/data in large repositories • E.g. fact extraction • E.g. Exploratory analysis of large datasets
• Find groups of interest expressed by user generated tags and their relations
38
![Page 39: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/39.jpg)
www.tugraz.at n
Your Assignment!
39
![Page 40: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/40.jpg)
www.tugraz.at n
Assignment 1/2 • Implementation (50%)
1. Compute altmetrics (25 pts) • Use rOpenSci to first search in arxiv.org for papers related to a
topic of your choice and then to retrieve with rAltmetric their altmetrics (http://ropensci.org)
• Result: List of 10 dois from arxiv.org with altmetrics, according altmetrics from altmetics.org (10 pts)
• Plot and interpret the results • Result: plot and textual interpretation (15 pts)
2. Use #altmetrics14 Twitter collection (25 pts)http://figshare.com/articles/An_altmetrics14_Twitter_Archive/1151577 • Extract mentions of users (user A mentions user B in tweet)
• Result:Table: userid userscreenname mentions (10 pts)
• Plot mentions in matrix • Result: plot and textual interpretation (15 pts)
40
![Page 41: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/41.jpg)
www.tugraz.at n
Assignment 2/2
• Report (25 points) • Collect related work in Mendely group (tag it with your name) (5
pts) • upload your paper and your source code in Mendeley (tag it with
submission_yourname, e.g. submission_xyz) (5 pts) • Write a scientific paper (4 pages) (15 pts)
• Presentation (25 pts): Present your paper in class • Motivate the work you have done (e.g. why altmetrics) in 1 slide • Present your results and how you got them • Bonuspoints: Present your own ideas for the Twitter dataset and
how you would tackle them à further bonus points if you implement them J
41
![Page 42: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/42.jpg)
www.tugraz.at n
Part 1: Compute altmetrics
42
![Page 43: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/43.jpg)
www.tugraz.at n
Short intro into R
• The R project for Statistical Computing • http://www.r-project.org
• Free software environment for statistical computing and graphics
• Classification, clustering, statistical tests, time-series analysis,...
• Simple way to produce „publication ready“ plots • Windows, unix, osx: CRAN mirror.
43
![Page 44: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/44.jpg)
www.tugraz.at n
The package rAltmetric • Package that enables to retrieve altmetric data from
altmetric.com for publications • Altmetric tracks what people are saying about
papers online on behalf of publishers, authors, libraries and institutions.
• http://cran.r-project.org/web/packages/rAltmetric/ • http://ropensci.github.io/rAltmetric/
• 2 major functions: • altmetrics() - Download metrics • altmetric_data() – Extract data
• Plus: functions to plot/print metrics 44
![Page 45: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/45.jpg)
www.tugraz.at n
Example
45
![Page 46: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/46.jpg)
www.tugraz.at n
Part 2: Use #altmetrics14 Twitter collection
46
![Page 47: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/47.jpg)
www.tugraz.at n
Analysis of Twitter dataset
• Determine impact of users at a scientific conference • Extract mentions of users (user A mentions user B
in tweet) • Plot mentions in matrix and interpret results
• https://www.miskatonic.org/2013/02/22/one-last-c4l13-tweet-thing-who-mentioned-whom/
47
![Page 48: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/48.jpg)
www.tugraz.at n
Write the scientific paper about your work
• 4 pages Springer LNCS format: • http://www.springer.com/computer/lncs?
SGWID=0-164-6-793341-0 • Structure of your paper:
• Abstract (= a short, complete summary of the paper with key findings)
• Introduction and Related Work (describes the theoretical background, indicates why the work is important, states a research question)
• Experiments and Results • Conclusion • References
48
![Page 49: Science 2.0 VUkti.tugraz.at/staff/elex/courses/science20/slides/... · ! Motivation • Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and](https://reader034.vdocuments.us/reader034/viewer/2022042420/5f36f1c790baa902165f8cb0/html5/thumbnails/49.jpg)
www.tugraz.at n
Presentation
• Workshop-Style: Presentation and Discussion • Present your work in max 10 min + 5 mins for
questions from the audience • No exam situation J • Mandatory attendence though
No plagiarism allowed!! But: Open Science – so if you use work of others, cite it properly – if you use work of your colleagues – cite them and give them credits!
49