![Page 1: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/1.jpg)
Social Mining & Big Data Analytics
H2020 - www.sobigdata.euSeptember 2015- August 2019
@SoBigData (https://twitter.com/SoBigData)
https://www.facebook.com/SoBigData
![Page 2: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/2.jpg)
The Consortium
![Page 3: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/3.jpg)
Delft 17 – 19 February 2016
Integrating national research
Infrastructures
![Page 4: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/4.jpg)
SoBigData is…
A Multidisciplinary European Infrastructure on Big Data and Social
Data Mining providing an integrated ecosystem for ethic-sensitive
scientific discoveries and advanced applications of social data
mining on the various dimensions of social life, as recorded by “big
data”.
![Page 5: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/5.jpg)
SMARTCATs
The GOAL of the Research Infrastructure
• to integrate key national infrastructures and centres of excellence at European level in social mining and big data analytics
• to enable cutting-edge, multi-disciplinary social mining & responsible data science experiments leveraging the Research Infrastructure assets: big data sets, analytical tools and services, and data scientist skills
• to grant access (both online and on-site) to multidisciplinary scientists, innovators, public bodies, citizen organizations, SMEs, as well as data science students at any level of education.
![Page 6: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/6.jpg)
SMARTCATs
The pillars for reaching the goal
• a distributed data ecosystem for procurement, access and curation of big social data
• a distributed platform of interoperable, social data mining methods and associated “data scientist” skills for mining, analysing, and visualising complex and massive datasets
• a community of multidisciplinary scientists, innovators, public bodies, citizen organizations, SMEs, as well as data science students at any level of education scientific, brought together by extensive networking and innovation actions
![Page 7: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/7.jpg)
Delft 17 – 19 February 2016
What are doing our researcher?
• any responsible data science experiment is composed by: – data acquiring (open data, crowdsourcing,
crowdsensing,) – model building (very complex validation phase), – creation of an exploration scenario (what-if
analysis) (different validation setting), – ….similar to many other data-driven science
process,…but data are produced by humans
![Page 8: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/8.jpg)
![Page 9: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/9.jpg)
Exploratories
Social Mining Research Environments tailored on specific multidisciplinary
domains • Promotes results sharing among scientists and
communities• Promotes the use of RI through Virtual and
Transnational Access
![Page 10: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/10.jpg)
Big Data for Societal Debates
Polarization, controversy and topic trends on societal debates through social mediaLead by Aris Gionis and Dominic Rout
![Page 11: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/11.jpg)
Polarized Political Debates
Monitoring Topics across Time and space
![Page 12: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/12.jpg)
Exploratory: Big Data for City of Citizens
Lead by Roberto Trasarti
![Page 13: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/13.jpg)
Estimating traffic fluxes on road network
A
B
C
HW
![Page 14: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/14.jpg)
Big Data for Well Being and Economic Performance
Deprivation Index (in France) predicted with Mobile Phone tracesLead by Peep Kungas
![Page 15: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/15.jpg)
Well-being and Economic Performance
Systemic Risk and Gender Diversity
![Page 16: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/16.jpg)
BigData & Migration Studies
![Page 17: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/17.jpg)
Sentiment Analysis • Internal and external perception by country– Index ρ - the ratio between pro refugees users and against refugees
users – Red means a higher predominance of positive sentiment, higher ρ– Yellow means a higher predominance of negative sentiment, lower ρ
(a) Overall. (b) Internal perception.
(c) External perception.
- +
- +
- +
![Page 18: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/18.jpg)
Firenze, 14 Nov 2016
SoBigData e-Infrastructure
• An Exploratory is also a Virtual Research Environments :– VRE are web-based, community-oriented, comprehensive, flexible, and
secure working environments – VREs are tailored to satisfy the needs of a designated community.
• services for data and methods discovery and access• collaboration oriented facilities enabling scientists
– DATA: different sharing policy, may be shared or not– METHODS: web services (executed over a variety of data centers), or
downloaded (packages to be executed on DATA side)– WORKFLOW: complex analytical process that may imply executions on
different sites. (currently only description or on-site execution on some special analytical platforms)
![Page 19: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/19.jpg)
Delft 17 – 19 February 2016
e-infra design
• SoBigData.eu portal• Sobigdata.eu Catalogue: a set of
functionalities to search, index and discovery all resources (Data, Models and workflows) (powered by D4Science)
• Virtual research Environments (Exploratories) functionalities to create, update and operation (powered by D4Science):
![Page 20: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/20.jpg)
Firenze, 14 Nov 2016
![Page 21: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/21.jpg)
Firenze, 14 Nov 2016
![Page 22: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/22.jpg)
Firenze, 14 Nov 2016
SoBigData example: Resource Catalogue
Search for datasets and methods
Description
Recent Activities
Action Bar
Recent Products
Statistics
![Page 23: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/23.jpg)
Firenze, 14 Nov 2016
VRE example: SoBigData VRE
Application
Posting messages to other VRE users
VRE Abstract
VRE Managers
News Feed
Top Topics
Recent Files
![Page 24: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/24.jpg)
The ethics of SoBigData
• Gathering large quantities of data may have serious consequences:– consequences range from personal harm, – to issues of autonomy, injustice and inequality.
• Making Big Data accessible is a value for democracy• SoBigData adheres to a value-sensitive design
approach:– design solutions to overcome ethical dilemma’s, in this
case those between the utility of the data gathered vs. the protection of the individuals subject to the research.
![Page 25: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/25.jpg)
Ethics in practice
• SoBigData has an ethical framework which provides a broad overview of all the ethical concerns of big data.
• But, as per the VSD outlook, data protection is not only the concern of the ethicists. In order to make the ideals of SoBigData successful, scientific methods also need to be developed in order embed moral principles in practice.
![Page 26: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/26.jpg)
The ethics of SoBigData
• How do we create an infrastructure in which such methods can be disseminated and improved upon?
• Data Management Plan plays a key role:– Each data has its privacy requirements and fact checks
and responsibility• Anonymization techniques are part of the research• Researchers will be trained in applying the
necessary procedural safeguards
![Page 27: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/27.jpg)
Anonymization
Service ProviderMining and
Analytical Engine
InfoMobility
Socio-economic indicators
Health services
![Page 28: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/28.jpg)
Educating the responsible data scientists
Based on a cooperation between ethicists and computer 1. A Massive Online Open Course (MOOC) which instructs
all prospective researchers about the legal and ethical dangers of big data research and the steps they can take to minimise these;
2. A set of workflows that outline the steps researchers can take when designing their approach;
3. Information pop-ups which redirect researchers to state-of-the-art ethical methods.
![Page 29: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/29.jpg)
New challenges are coming
• One of the OECD ideals is algorithmic transparency and the GDPR, also, says that decision-making algorithms should be explainable.
• But what is enough to constitute an explanation?
• We're working on developing some sort of template that would satisfy most people's conceptions of what an explanation should be
![Page 30: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/30.jpg)
Delft 17 – 19 February 2016
• What function should a terms of use have? Currently, SoBigData is trying to defer legal responsibility to the Final User through the ToS, but this is difficult.
• Example: How do we deal with Twitter's intellectual property rights? May he scraper violate Twitter terms of use? although of course the above also holds. How does SoBigData relate to data collectors terms of service
New challenges are coming
![Page 31: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/31.jpg)
Delft 17 – 19 February 2016
SoBigData metadata structure
• A highly structured and detailed metadata structure has been designed in order to provide information about:– Description of the dataset (to make it Findable)– How the dataset has been produced– Intellectual Property– Privacy issues– Who can access the data and how (terms of use, NDA…)
• Mainly based on the DataCite standard
![Page 32: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/32.jpg)
• CopyrightCopyright is a legal right that grants the creator of an original work exclusive rights for its use and distribution for a limited amount of time. Copyright can exist on individual data as well as over a dataset or database as a whole. The application of copyright to factual data and metadata have no eligibility for copyright protection.
• LicenseA license is a unilateral permission by the right holder from the licensor to the licensee to use certain rights. Licenses distinguish themselves from contracts since the implementation of a license does not require mutual agreement.
• Terms of UseThe terms of use are rules that one must obey in order to use the data or service. The terms of use agreement is mainly used for legal purposes by data providers and databases that store data. A legitimate terms of use agreement is legally binding and may be subject to change.
Managing data: disambiguating terms
Firenze, 14 Nov 2016
![Page 33: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/33.jpg)
There are three broad types of data:• Primary/raw data: data coming directly from the
source;• Derivative data: data processed from primary/raw data;• Metadata: reference data describing either the
primary/raw data or derivative data.
Primary/raw data and derivative data may be licensed under different conditions and by different stakeholders
Managing data: type of data and their licenses
Firenze, 14 Nov 2016
![Page 34: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/34.jpg)
The e-infrastructure offering discover and access may be operated by a different actor and offered through specific terms of use.
In this case three types of licenses may be involved:
1. the one agreed between the system operator and the primary data owner,
2. the one selected for derivative product that may differ from the one associated with primary data, and
3. the one agreed between the system operator and the data consumer.
All these licenses have to be captured by the “terms of use” of the e-infrastructure, i.e., they are part of the rules a consumer must agree to accept when using the system.
Managing data: dealing with complexity
Firenze, 14 Nov 2016
![Page 35: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/35.jpg)
Firenze, 14 Nov 2016
A re-use license (to specify in the ToU of the e-infrastructure) concerns at least attribution, copyleft requirement, and control on commercial exploitation of the dataset. Moreover it is needed to manage and apply some forms of control on access.Virtual Research Environments (VREs) offer flexible and secure web-based, community-centric platforms, so researchers can work together on common challenges. VREs terms of use are automatically composed according to the combined data and services selected at the time of VRE definition. • Raw data are then licensed according to the license expressed by the data
owner/custodian and expressed at time of registration of the data content to the e-infrastructure.
• Derivative data products instead are licensed with a license compatible and legally interoperable with the one associated with the primary data.
• It remains under the responsibility of a single user, as expressed in the VRE terms of use, to confirm the license to associate with any produced derivative data.
Managing data:VRE as an instrument to manage the complexity
![Page 36: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/36.jpg)
Firenze, 14 Nov 2016
Meta data definition: Ethics
![Page 37: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/37.jpg)
Firenze, 14 Nov 2016
Meta data definition: Intellectual Properties
![Page 38: SoBigData. European Research Infrastructure for Big Data and Social Mining](https://reader035.vdocuments.us/reader035/viewer/2022081604/5878adbc1a28ab724c8b4cc1/html5/thumbnails/38.jpg)
Il laboratorio di ricerca SoBigData.it organizza la prima Tuscan Big Data Challenge, un’occasione gratuita per le aziende per migliorare il proprio business. Grazie a un’analisi avanzata ei dati prodotti dalle aziende o estratti da internet è possibile ricavare informazioni utili su diversi fronti.