general invitation totenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_ec_02…  · web...

135
European Data Market SMART 2013/0063 D2 - Methodology Report FINAL

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

European Data MarketSMART 2013/0063

D2 - Methodology Report FINAL

2nd August, 2014

Page 2: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Author(s) Giorgio Micheletti, Gabriella Cattaneo, Rosanna Lifonti, Nina Bonagura (IDC)

David Osimo, Katarzyna Szkuta (Open Evidence)

Deliverable D2 Methodology Report

Date of delivery July 25, 2014

Version 3.0

Addressee officer Katalin IMREI

Policy Officer

European Commission - DG CONNECT

Unit G3 – Data Value Chain

EUFO 1/178, L-2557 Luxembourg/Gasperich

[email protected]

Contract ref. N. 30-CE-0599839/00-39

2

Page 3: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

TABLE OF CONTENTS

1. INTRODUCTION...........................................................................................................8

1.1. OVERVIEW...................................................................................................................... 8

1.2. DEVELOPMENT OF THE EDM TAXONOMY..........................................................................9

1.2.1. Main definitions...................................................................................................9

1.3. THE DATA MARKET VALUE CHAIN..................................................................................11

2. DESIGN OF INDICATORS.........................................................................................13

2.1. INDICATOR 1: NUMBER OF DATA WORKERS....................................................................15

2.1.1. Definition and statistical reference.....................................................................15

2.1.2. Description of the indicator................................................................................18

2.1.3. Main data sources..............................................................................................18

2.1.4. Gap analysis......................................................................................................24

2.1.5. Measurement approach......................................................................................24

2.1.6. Qualitative interviews........................................................................................28

2.2. INDICATORS 2 AND 3: NUMBER AND REVENUES OF DATA COMPANIES.............................29

2.2.1. Definition and statistical reference.....................................................................29

2.2.2. Description of the indicator................................................................................35

2.2.3. Main data sources..............................................................................................36

2.2.4. Gap analysis......................................................................................................39

2.2.5. Field research surveys........................................................................................40

2.2.6. Measurement approach......................................................................................47

2.3. INDICATOR 4.1: SIZE OF THE DATA MARKET...................................................................50

2.3.1. Definition and statistical reference.....................................................................50

2.3.2. Description of the indicator................................................................................50

2.3.3. Main data sources..............................................................................................51

2.3.4. Gap analysis......................................................................................................52

2.3.5. Measurement approach......................................................................................53

2.4. INDICATORS 4.2-4.3: VALUE OF THE DATA ECONOMY.....................................................54

2.4.1. Definition and statistical reference.....................................................................54

2.4.2. Description of the indicator................................................................................54

2.4.3. Main data sources..............................................................................................55

2.4.4. Gap analysis......................................................................................................55

2.4.5. Measurement approach......................................................................................55

2.5. INDICATOR 5: DATA WORKERS SKILLS GAP....................................................................56

2.5.1. Definition and statistical references....................................................................56

3

Page 4: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.5.2. Description of the indicator................................................................................57

2.5.3. Main data sources..............................................................................................57

2.5.4. Gap analysis......................................................................................................57

2.5.5. Measurement approach......................................................................................58

2.6. INDICATOR 6: CITIZEN’S RELIANCE ON THE DATA MARKET..............................................58

2.6.1. Definition and statistical references....................................................................58

2.6.2. Description of the indicator................................................................................61

2.6.3. Main data sources..............................................................................................61

2.6.4. Gap Analysis......................................................................................................62

2.6.5. Measurement approach......................................................................................63

2.7. DATA COLLECTION: OVERVIEW OF FIELD RESEARCH........................................................63

2.8. FORECASTING INDICATORS............................................................................................64

2.8.1. Step 2: development of key assumptions.............................................................64

2.8.2. Step 3: scenarios development...........................................................................65

2.8.3. Step 4: Forecast calculations..............................................................................65

2.8.4. Steps 5 and 6: Communication, validation and final revision.................................67

2.9. INDICATORS FOR WORLDWIDE MONITORING...................................................................67

2.9.1. Description of International Indicators................................................................67

2.9.2. Main Data Sources.............................................................................................71

2.9.3. Gap analysis......................................................................................................72

2.9.4. Measurement approach and output.....................................................................72

3. DESIGN OF THE EDM MONITORING TOOL............................................................74

3.1. OVERVIEW....................................................................................................................74

3.2. IMPLEMENTATION APPROACH.........................................................................................75

3.2.1. Finalization of the EDM design and methodology.................................................76

3.3. ASSESSMENT OF FRAMEWORK CONDITIONS...................................................................76

3.3.1. Policy/ regulatory Conditions..............................................................................77

3.3.2. Market development – non regulatory conditions.................................................77

3.4. DESIGN OF THE QUALITY CONTROL PROCESS..................................................................78

3.4.1. Metrics and Output.............................................................................................79

3.5. ASSESSMENT OF PROGRESS ON POLICY TARGETS...........................................................79

4. NEXT STEPS..............................................................................................................81

4.1. NEXT STEPS..................................................................................................................81

ANNEX 1 TAXONOMY, RELEASE II........................................................................................82

INTRODUCTION.......................................................................................................................... 82

DESIGN OF THE DATA VALUE CHAIN..........................................................................................82

4

Page 5: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

DATA MARKET DEFINITIONS......................................................................................................84

Definition of Data.............................................................................................................84

Data Economy and Data Market........................................................................................84

Data-related Companies...................................................................................................85

Data Users 85

Data workers and Data Scientists......................................................................................85

DATA-DRIVEN INNOVATION.........................................................................................................86

DATA PRODUCTS, SERVICES AND TOOLS...................................................................................86

MAIN STAKEHOLDERS...............................................................................................................87

Data Holders 87

Data companies 88

New - Specialized Intermediaries......................................................................................89

ICT Enablers and Infrastructure providers..........................................................................89

Final Users 91

Enabling Players 91

ANNEX 2 DATA WORKERS – SELECTED ISCO CODES.......................................................92

ANNEX 3 DATA COMPANIES – SELECTED CODES FROM NACE REV2.............................96

ANNEX 4 PEER REVIEW........................................................................................................101

Is the report complete, according to the expectations shown in the Inception report? Does it respond to the objectives set

forth in the work plan?....................................................................................................101

Is the report clear and coherent in its statements, assessments and arguments?......................................101

Are the language and the format of the report of good quality?...........................................................101

What is your opinion on this chapter? Is the methodology appropriate? Is the chapter coherent, relevant, credible and well

presented? 102

What is your opinion on the key definitions and the taxonomy presented in Annex I? Are they complete, clear, coherent,

credible and useful for the objectives of the study?...................................................................102

Are you aware of any further data or literature that you suggest to include?...........................................102

Recommendations for revision - further development of the Chapter...................................................102

What is your opinion on the methodological approach for Indicator 1? Is it appropriate, coherent, relevant, credible and

well presented? 103

Are you aware of any further data or literature that you suggest to include?...........................................104

Recommendations for revision - further development of the Indicator:.................................................104

What is your opinion on the methodological approach for Indicator 2 and 3? Is it appropriate, coherent, relevant, credible

and well presented?.......................................................................................................104

Are you aware of any further data or literature that you suggest to include?...........................................105

Recommendations for revision - further development of the Indicator:.................................................105

What is your opinion on the methodological approach for Indicator 4? Is it appropriate, coherent, relevant, credible and

well presented? 105

5

Page 6: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Are you aware of any further data or literature that you suggest to include?...........................................105

Recommendations for revision - further development of the Indicator:.................................................105

What is your opinion on the methodological approach for Indicator 5? Is it appropriate, coherent, relevant, credible and

well presented? 106

Are you aware of any further data or literature that you suggest to include?...........................................106

Recommendations for revision - further development of the Indicator:.................................................106

What is your opinion on the methodological approach for Indicator 6? Is it appropriate, coherent, relevant, credible and

well presented? 106

Are you aware of any further data or literature that you suggest to include?...........................................106

Recommendations for revision - further development of the Indicator:.................................................107

What is your opinion on this methodology? Is it appropriate, coherent, relevant, credible and well presented?..107

Are you aware of any further data or literature that you suggest to include?...........................................107

Recommendations for revision - further development of the Chapter...................................................107

What is your opinion on this paragraph? Is the methodology approach appropriate? Is it coherent, relevant, credible and

well presented? 107

Are you aware of any further data or literature that you suggest to include?...........................................107

Recommendations for revision - further development of the Chapter...................................................107

What is your opinion on this chapter? Is the methodology approach appropriate? Is it coherent, relevant, credible and well

presented? 108

Are you aware of any further data or literature that you suggest to include?...........................................108

Recommendations for revision - further development of the Chapter...................................................108

MAIN REFERENCES...............................................................................................................109

Figures

Figure 1 Main Objectives and work packages of the study.....................................................................8

Figure 2 The data value chain and ecosystem.....................................................................................11

Figure 3 Classification of Data Companies...........................................................................................30

Figure 4 Worldwide Consumer Market Segments, 2013.....................................................................62

Figure 5 The EDM Monitoring Tool.......................................................................................................75

Tables

Table 1 ISCO-08 structure and data workers.......................................................................................17

Table 2 Indicator 1: Number of data workers........................................................................................18

Table 3 Key External Datasets and Sources Leveraged for the estimation of the total number of data workers.......................................................................................................................................... 21

Table 4 Key IDC Datasets and Sources Leveraged for the estimation of the total number of data workers.......................................................................................................................................... 22

Table 5 Procedure for the estimation of the total number of data workers...........................................26

6

Page 7: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 6 Procedure for the estimation of employment share and intensity share..................................28

Table 7 Selection of codes from Section J, NACE rev2, where data companies may be classified.....32

Table 8 Selection of codes, Section M NACE rev2, where data companies may be classified............33

Table 9 Main industries and NACE codes where users may be classified..........................................34

Table 10 Indicator 2: Number of data companies.................................................................................35

Table 11 Indicator 3: Revenues of data companies..............................................................................35

Table 12 Key External Datasets and Sources for the estimation of the data-supply companies and revenues....................................................................................................................................... 38

Table 12 Indicators of selection of MS for the survey...........................................................................40

Table 13 Indicators of selection of MS for the survey...........................................................................45

Table 14 Proposed survey sample.......................................................................................................46

Table 15 Estimate process...................................................................................................................48

Table 16 Indicator 4: Size of the data market.......................................................................................50

Table 17 Key External Datasets and Sources for the estimation of the data economy market............52

Table 18 Estimation procedure of the data market...............................................................................53

Table 19 Indicators 4.2 and 4.3...........................................................................................................54

Table 20 Indicator 5: Data Workers Skills Gap.....................................................................................57

Table 21 Indicator 6: Citizens’ reliance on the data market..................................................................61

Table 22 Summary of Field research activities.....................................................................................63

Table 23 Forecast of Main Data Market indicators...............................................................................66

Table 24 Forecast of Main Data Economy indicators...........................................................................66

Table 25 Forecast of Indicators 5 and 6...............................................................................................66

Table 26 – International Monitoring, Indicator 1....................................................................................67

Table 27 – International Monitoring, Indicator 2....................................................................................68

Table 28 – International Monitoring, Indicator 3....................................................................................68

Table 29 – International Monitoring, Indicator 4....................................................................................70

Table 30 – International Monitoring, Indicator 5....................................................................................70

Table 31 EDM Indicators and key policy targets..................................................................................80

Table 32 ICT Enablers and Infrastructure providers............................................................................90

Table 33 ISCO-08 codes where data workers may be included...........................................................92

7

Page 8: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

1.Introduction

1.1. Overview

This is the revised version of the Methodology Report (Deliverable D2) of the Study “European Data Market SMART 2013/0063” entrusted to IDC and Open Evidence by the European Commission DG Connect, taking into account the feedback by the EC in the 2nd Interim meeting of July 2nd, 2014 and the peer review by Francesco Daveri. This report presents the overall conceptual framework of the study identifying the main components of the European data market and ecosystem, and the design of the European Data Market Monitoring tool.

The main goal of this study is to define, assess and measure the European data economy with the aim to support the EC’s Data Value Chain policy. The specific objectives are:

Objective A: The development of a European Data Market Monitoring Tool providing facts and figures on market size and trends;

Objective B: The collection and production of descriptive stories about the European data economy including quantitative facts and figures;

Objective C: Building a community of relevant stakeholders in the EU to deep internal connections among existing communities and reach out to different and new stakeholders.

The objectives will be reached through the work packages described in the Figure below. This report is the output of WP1 Methodology development.

Figure 1 Main Objectives and work packages of the study

Objective ADevelopment and

Implementation of theEuropean Data Market (EDM)

Monitoring Tool

Objective BProduction of descriptive

stories on the EDM

Objective CDevelopment of a Stakeholder

Community

WP1 Methodology Development

WP2 Monitoring the EDM

WP3 Producing Stories on the

EDM

WP4 Community Building and

Management

WP

5 P

roject Managem

ent

Source IDC 2013

This report is structured as follows:

Chapter 1 presents the introduction, the main goals and scope of the study, and outlines the methodological approach, including the refinement of the Data Value Chain and Taxonomy;

Chapter 2 presents in detail the design of the 6 Indicators developed by the study, based on desk research on the availability of data, a gap analysis of data availability and a feasibility check; this includes the EU monitoring and the international monitoring.

8

Page 9: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Chapter 3 presents the design of the EDM monitoring tool based on a sound methodology for the collection and measurement of facts and figures on the European data market;

Chapter 4 draws the main conclusions and outlines the next steps of the study.

The annexes include:

The release 3 of the Taxonomy; The statistical codes of identification of the data workers; The statistical codes of identification of the selected industries targeted by the data companies

survey. The peer review carried out by Francesco Daveri. The main references used in the study.

1.2. Development of the EDM Taxonomy

Our taxonomy is the starting point of the analysis and the foundation of the conceptual framework of the study. The taxonomy presents clear definitions of all the main terms used in the analysis and in the monitoring tool, providing an objective and scientific basis for the definitions of the indicators and the scope of their measurements. The taxonomy is developed on the basis of desk research of the main public sources and IDC’s own taxonomies and research.

This report presents the release 3 of the taxonomy, updated on the basis of desk research and the data companies mapping and classification presented in D.4.1 the “European Data Landscape” (visible at http://datalandscape.eu/). This includes the definitions used for:

Data and type of data; Data market, data economy, data workers, data scientists, data companies; Data skills; Data-based products and services; Main Stakeholders; Main Framework Conditions.

These definitions are presented through a structured template which has been made available for browsing and integration in the stakeholder community. The release 3 compared to release 2 has an update of the definition of data company to make it more operational in view of the field research.

The data market taxonomy is a live document which will be completed and updated as the study proceeds, also on the basis of feedback from the stakeholders and the EC.

1.2.1. Main definitions

The following definitions are mainly based on OECD reports (2011, 2013)1 about data and data economy.

Data is usually defined as qualitative or quantitative statements or information which can be coded and which are assumed to be factual and not the product of analysis or interpretation. For the sake of this study we consider only data which is collected, processed, stored, and transmitted over digital information infrastructures and/or elaborated with digital technologies. This definition includes multimedia objects which are collected, stored, processed, elaborated and delivered for exploitation through digital technologies (for example, images databases).

1 OECD, Exploring the Economics of Personal Data: a Survey of Methodologies for Measuring Monetary Value, OECD Digital Economy Papers, n. 220

OECD, Exploring Data-Driven Innovation as a New Source of Growth: Mapping the Policy Issues raised by Big Data, OECD Digital Economy Papers, n. 222

9

Page 10: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Information is the output of processes that summarize, interpret or otherwise represent the content of a message to convey meaning. Therefore information is not a mere synonymous of data.

We also need to distinguish between the data market and the data economy as follows.

The data market is the market where digital data is exchanged as "products" or as "services" derived from raw data. The exploitation of the exchanged data enables a better understanding of the environment, and helps improving existing services, increasing efficiency, and eventually launching new products/services also in the more traditional sectors of the economy (such as manufacturing, transport or retail).

The data economy involves the generation, collection, storage, processing, distribution, analysis, elaboration, delivery and exploitation of data enabled by digital technologies. The data economy includes also the direct, indirect and induced effects of the data market on the economy.

It should be noticed that the data economy is not synonymous with the knowledge economy, a broader concept which can be defined as follows:

We define the knowledge economy as production and services based on knowledge-intensive activities that contribute to an accelerated pace of technical and scientific advance, as well as rapid obsolescence. The key component of a knowledge economy is a greater reliance on intellectual capabilities than on physical inputs or natural resources2.

This is a definition based on sociology which does not readily link with main statistics. Therefore in this study we have developed these definitions and linked them with the main statistics codes.

The OECD has led a decade-long work on the definition of the Information economy and the knowledge economy and their correspondence to main statistical sources, eventually deciding that their perimeter included the ICT sector plus the Content and Media sector3. In the last years the OECD has focused on the measurement of the Internet Economy defined as follows:

“The Internet economy is defined as covering the full range of our economic, social and cultural activities supported by the Internet and related information and communications technologies"4.

Again, from the OECD research it is clear that the data economy is a new phenomenon correlated with the Internet economy but not coinciding with it. The OECD report “Measuring Big Data related industries: an exploration” (2012) for example identified them as those industries collecting, processing and diffusing digital data, including publishing of directories, data processing, hosting and related activities, and web portals. This is definitely a smaller perimeter than that considered in this study (see par. 2.3 on data companies). A forthcoming new OECD report “Measuring the digital economy: a new perspective” (currently circulated in draft format) presents the most recent and exhaustive list of key ICT indicators; the report mentions the emergence of big data analytics, but does not include any specific indicator. These considerations show that the identification of the characteristics and boundaries of the emerging data industry and market are still evolving and not consolidated by literature. As will be shown in this report, this leads to the need to perform field research to collect fresh data to measure these indicators.

2 The Knowledge Economy, Paper for the Annual Review of Sociology, Stanford University 20043 Guide to Measuring the Information Society, 20094 “Measuring the Internet Economy: A Contribution to the Research Agenda”, OECD Digital Economy Papers,

2013

10

Page 11: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

1.3. The Data Market Value Chain

The following figure presents the design of the data value chain reflecting the data ecosystem. This is our starting point for the conceptual framework with which we approach the measurement of the data market and the data economy.

The data value chain may be revised and updated based on the results of the research to be carried out in WP2.

Figure 2 The data value chain and ecosystem

MICROECONOMIC IMPACTSCosts savingsIncreased flexibility thanks to timely and improved decision makingNew products/servicesImproved customer servicesIncreased revenue

MACROECONOMIC IMPACTS

GDP growthSMEs and jobs creation Data-drivencompetitiveness of the EU industry -

Data collection and creation

Storage,aggregation, organization

Analysis, processing,

marketing and distribution

DATA VALUE CHAIN

Framework Conditions of development of the European Data Economy

Policy/ Regulatory Framework Conditions Non Regulatory Framework Conditions

Dat

a Pr

ivac

y

Dat

a O

wne

rshi

p

Cop

yrig

ht

Secu

rity

Skill

s

infra

stru

ctur

es

Inte

rope

rabi

lty,

Stan

dard

s

Acce

ss to

risk

ca

pita

l

Stakeholder Categories

ICT Enablers and Cross Infrastructures

Data HoldersNew Intermediaries

Final Users (internal/external use)

Vertically Integrated Suppliers

Primary use

Re-use

Information services

Source: IDC 2013

Figure 2 is composed of the following elements, which describe the structure of the data economy:

The data value chain shows the 4 main phases of manipulation of data which lead to its exploitation;

The macroeconomic and microeconomic impacts outline the direct and indirect impacts of the data value chain on the economic system and user enterprises;

The stakeholder categories identify the main type of actors on the basis of their role in the data value chain;

The framework conditions identify the main factors which will enable or prevent the development of the European data market and economy. They are divided into policy-regulatory framework conditions and non-regulatory conditions.

While we have identified multiple stakeholders with multiple roles, it is clear that the leading global web platforms such as Amazon, Google and Facebook dominate the whole value chain; their vertical integration and their market dominance represent a huge competitive advantage. The framework

11

Page 12: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

conditions (such as for example the Digital Single Market) instead are of primary relevance for all the other players, particularly native EU players.

Within the framework of the present study, the main steps of the data value chain to be taken into consideration are as follows:

Collection/access of data from myriad of sources within the applicable legal framework. Collection can be direct (for example through loyalty schemes operated by retailers, transport and hospitality service providers) or indirect (for example by recording the location of someone using a cellular phone). Data can be also created through an analysis rather than being captured;

Storage and aggregation by service providers and social networks, but also by companies in traditional sectors such as finance, retail, transport, utilities, government;

Processing Analysis, marketing and distribution, merging data from different sources (public, proprietary or institutional research) and relying on analytics to derive insights and value. Traditional players across vertical markets can perform this task if they have the necessary skills/technology; alternatively they can rely on external data brokers and providers;

Usage, both in the public and private sectors to better serve customers and/or improve efficiency. The usage of data is broken down between primary use (when data is used for the goal for which it is collected: for example mobile traffic data to bill customers by a telecom company) and secondary use or re-use (when data is exploited for other goals, for example when mobile traffic data is used to map customers movements for a retail company). Re-use is expected to be the source of much of the value added of the data market.

12

Page 13: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.Design of Indicators Approach In the past decades, scholars agreed that the economy in developed countries has become driven by technologies based on knowledge and information production and on its dissemination. The knowledge economy relates to production and services based on knowledge-intensive activities that contribute to an accelerated pace of technological and scientific advance. The knowledge economy is based on greater reliance on intellectual capabilities than on physical inputs or natural resources and knowledge can be embodied in both goods and services. These changes in production processes are part of a broader shift from tangible goods and production factors to intangible or information goods (Shapiro & Varian 1999).

Studies using macro level data tend to fail to find a linkage between technology and productivity. Studies relying on more fine-grained, firm-level data proved to be more effective and have captured much more of an effect of technology on productivity and other impacts. This pattern of results suggests the difficulty of measuring aggregate output (Brynjolfsson and Hitt, 2000; Powell, Snelman, 2004). A number of studies (Brynjolfsson & Hitt 1995, 2000) also showed that technology enables complementary organizational investments, which, in turn, reduce costs and improve output quality and thus lead to long-term productivity increases.

Case studies and econometric studies continually provide evidence that organizational complements such as new business processes, new skills and new organizational and industry structures are major drivers of the contribution of IT and of the knowledge economy.

The emerging data market is part of the knowledge economy and, moreover, it is an emerging market. There are still few studies on the data market, and a scarce knowledge about its impacts at macroeconomic level is available.

The approach to the supply chain, as shown in Figure 2 presented in the previous chapter, shows the supply chain in terms of supply and demand framework of the European Data Market and its intermediate inputs. This entails an analysis based on a micro-economic approach more than on a macro-economic one.

A micro-foundation approach means that the analysis will be based on the analysis of the behaviour of individual agents. Assumptions will be taken at micro-economic level and, where necessary, the aggregation of the micro-economic levels will drive to some conclusions at macro-economic level, and at aggregate level.

Micro-economic models are more appropriate to predict the impact of both policy changes and of emerging markets such as data market at aggregate level. One of the main relevant issues is that it is still not clear how to assess the net effects of innovation paths in the economic system.

A micro-foundation approach is based on the benefits and reasons related to the implementation and adoption of data products and services. Such direct benefits are the results of data market outcomes. To analyse such benefits and effects we will start from the building blocks of the supply chain which are the supply and demand of the data products and services. This is a traditional supply-demand approach and will be extensively presented in the next chapter.

In order to adopt a micro-foundation approach, we need to examine the impacts of the adoption of data-related products and services on the economic agents. To do so, the micro-economic analysis requires an analysis at supply-side level and at demand-side level. The proposed methodological approach is pragmatic and based on both desk research and primary research. The objective is to suggest and measure a number of indicators able to capture the most relevant impacts arising from the diffusion in the economic system of new goods such as data-related products and services.

13

Page 14: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

In the following paragraphs, we are going to present a selected number of indicators to be measured. The data-related products and services are clearly not classified in the economic statistics so that one of the major issues is to identify the data which we can use to start and build an economic analysis.

The next paragraphs, from 2.2 to 2.6 are presenting the indicators we suggest to estimate. Each paragraph will follow the same structure which is as follows:

Definition and statistical reference of the indicator: each indicator will be at first defined from a conceptual point of view and then it will be defined from a statistical point of view. Since statistics are not available for the data products and services we need to explore proxies or statistical variable usable to estimate the indicators we are interested in;

Description of the indicators: a synthetic table presents the indicators we wish to measure with the corresponding short description and segmentation (geographical, company size, industry);

Main data sources: selection of all the available data sources, which may be useful for an estimation of the indicators. This paragraph also explores the segmentation of the available data. The considered sources are both public and private;

Gap analysis: the data selected in the above paragraph will be assessed in terms of availability (whether data are available or not and for the desired segmentation), quality and reliability (it a quality assessment of the data), feasibility of the indicator (based on the previous availability and reliability, we assess whether the estimation of the indicator is feasible and with which segmentation). Availability, reliability and feasibility are scored with a colour (green=high, yellow=medium, red=low). Where the feasibility of the indicators is very low, we will plan a survey in order to fill the gap of the data collection;

Measurement approach: before presenting the measurement approach, we present a number of assumptions, then the estimation approach and the outputs of the measurement. It is clear that this is a proposal, which is our first best and that we may need to modify this approach when we will be actually estimating the indicators because of unexpected problems. The measurement approach ends with the suggestion of some sanity check; further check may be suggested during the estimation process.

Time covered by the indicators: all the indicators will be delivered estimated at years 2013-2014. We should here remind that most of the data collected with the official statistics are not going to be updated to years 2013-2014. Usually the national statistics are one year late and sometime even more than one year. It also happens that time series may have some gaps (by countries or by year). The contractor will have to complete and fill the gaps of the official statistics in order to make them usable. To do so, the missing gaps will be estimated with traditional and consolidated statistical and economic methods.

The last paragraph of the chapter focuses on a forecasting approach which is addressed to forecast in the medium to long term the indicators selected (2020). Such forecasts will be based on scenarios approaches.

14

Page 15: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.1. Indicator 1: Number of Data Workers

2.1.1. Definition and statistical reference

4.

5.

5.3.

5.3.1.

5.3.1.2. Data workers: definition and discussion Data workers collect, storage, manage and analyze data, as their primary activity: they should be able to work with massive database and with emerging database technology. Data workers are included in the category of the knowledge workers and specifically in the category of the “codified” knowledge (Lundavall and Johnson, 1994); data workers specifically deal with data while knowledge workers deal with information and knowledge. Data entry clerks' primary activity is related to data, so they could be considered data workers; however, data entry is a very routine task and for the sake of this study, data entry clerks are not going to be considered as knowledge workers. Another specific category of data workers is data analysts, who usually extract and analyse information from one single source, such as a CRM database. They require a medium level of creative thinking and usually work on structured data.

Within the broader category of data workers, we include the category of data scientists

Data scientists require solid knowledge in statistical foundations and advanced data analysis methods combined with a thorough understanding of scalable data management, with the associated technical and implementation aspects (European Big Data Value Partnership Strategic Research and Innovation Agenda, April 2014). They can deliver novel algorithms and approaches such as advanced learning algorithms, predictive analytics mechanisms, etc.. Data scientists should also have a deep knowledge of their businesses; the most difficult skills to find, include advanced analytics and predictive analysis skills, complex event processing skills, rule management skills, business intelligence tools, data integration skills (UNC, 2013).

We are currently not in a position to estimate the data scientists, but it is useful to clarify that they are part of the data workers for the following reasons:

Because a valuable system of indicators has to consider not only the definitions and indicators which are useful and feasible today, but also the indictors feasible when the industry will overcome the very initial stage

Because one of the next indicators relate to the skills gap. Again, the skills gap will be calculated with reference to the data workers in general. Nevertheless, when we will discuss the skills gap it will be important to be aware about the skills needed and this can be explored only keeping the skills specifically referred to the data scientists, not only to the larger category of data workers.

Data scientists are not going to be estimated but it is important to be aware of these necessary skills.

Technology is an enabling factor which is transforming generation and use of data as well as the related added value (OECD, 2013). Economic and social activities in fact have long made use of data. In recent years, the development path of ICTs increasingly enabled the economic exploitation of data. Technology proves to be important along the data value chain to:

15

Page 16: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Decline costs, Increase the generation and use of data Accelerate migration of socioeconomic activities to the Internet with a wide adoption of e-

services.

Based on this definition, data workers refer to both supply and demand of data products and services. In one case they deliver data products and services and in the other case they are users of those data products and services for example to take decisions into their enterprises.

Data workers and data scientists are therefore workers using the IT technology to create value from raw data which are available on the market. Data workers create value because:

They are employed on the supply side and they create value through the sale of data products and services;

Data workers are employed on the demand side and they create value because they use of data products and services to improve competitiveness and productivity of their companies.

The use of data is pervasive and has penetrated every industry and business function, and data are now relevant production factors, with labour and capital.

5.3.1.3. Statistical definition of the data workersData workers are not classified as such into any of the labour and occupation statistics. As it usually happens, the emerging sector and industries and the related variables are usually not traceable into consolidated statistics. This means that, to estimate such variables, we need to trace the indicators we are interested in, into more general data and to define and find out an approach to estimate them.

In order to define statistically the data workers, we have adopted the International Standard Classification of Occupations (ISCO-08). Clearly, into the ISCO classification, we don’t find any category referring to data workers and data scientists. Nevertheless, we can define in which categories of the ISCO-08, data workers and data scientists may be classified. In this paragraph we present the categories of the ISCO-08 where data workers may be classified and counted.

In Annex 2, the detailed table with the list of the ISCO-08 codes selected is presented.

The criteria adopted for the selection of the ISCO-08 codes are the following:

We have selected the occupations where data workers can be involved either as data providers or as data users;

We have selected the occupations from 1 to 4 digit disaggregation; The occupation codes selected are those where the presence of data workers can be

detected because o They hold deep analytical skills o They do not need deep analytical skills but basics understanding of statistics and/or

machine learning in order to conceptualize the questions that can be addressed through deep analytical skills

o They are the ones providing enabling technology and therefore they are providers of data services

The selected codes are those where a significant part of the workers may be data workers; the occupations where the data workers are a very marginal part of the workers have been excluded; as an example, the medical practitioners have been excluded, although some practitioners may be data workers because they undertake research activities. Since they are only a very marginal part of the practitioners, we excluded them from the occupations where data workers are present

We excluded all the data workers which are not included into the knowledge economy perimeter because their occupation is a low skilled one, i.e. with high routine level (as an

16

Page 17: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

example, call centre workers are in theory data workers but since their activity is a routine one and as such excluded from the knowledge economy, they are not considered data workers).

The selected codes finally include:

4 major groups (1 digit) 9 sub-major groups (2 digit codes) 21 minor groups (3 digit codes) 52 unit groups (4 digit codes)

The relevance of codes including data workers is shown in the below table: 4 out of 10 major groups (1 digit) include data workers while at the very lowest disaggregation level, 12% of the units include data workers. This represents the perimeter to be assessed.

Table 1 ISCO-08 structure and data workers

ISCO-08 structured classification

Major groups (1 digit)

Sub-groups (2 digits)

Minor groups (3 digits)

Units (4 digits)

Number of codes ISCO-08 structure

10 43 130 436

Number of selected codes including data workers

4 9 21 52

Share of data workers codes in the ISCO-08 structure

40% 21% 16% 12%

Source: IDC elaboration on ISCO codes

17

Page 18: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.1.2. Description of the indicator

Table 2 Indicator 1: Number of data workers

Indicator 1 - Description

N. Name Description Type and Time Segmentation

1.1Number of data workers

Total number of data workers in the EU/ EFTA

Absolute number,

2013-2014 est.

By Geography: 28 EU MS + total EU

EFTA (except Lichtenstein)

By Industry: 11 sections (economic activities from 11 NACE sections, rev 2 statistical classification)

1.2Employment share

Total number of data workers compared with the total employment in the EU/ EFTA

% of data workers on total employment,

2013-2014 est.

By Geography: 28 EU MS + total EU

EFTA (except Lichtenstein)

By Industry: 11 sections (economic activities from 11 NACE sections, rev 2 statistical classification)

1.3Intensity share

Average number of data workers per company, i.e. ratio total n. of data workers on n. of companies

(only for private sector)

Absolute number

2013-2014 est.

By Geography: 28 EU MS + total EU

EFTA (except Lichtenstein)

By Industry: 11 sections (economic activities from 11 NACE sections, rev 2 statistical classification)

Ideally, it would be interesting to calculate the intensity share, i.e. the average number of data workers per company by company size: a comparison of this indicator by company size would be of high interest, but unfortunately we are not in a position to split such indicator by company size.

Nevertheless, we suggest to keep it and to calculate it for the companies as a whole; this is in fact an indicator which is interesting to have over time to trace the evolution of the data companies.

The number of data workers encompass the number of headcount of data workers, meaning that a data worker is counted when its main activity relates to data, i.e. the majority of its working time is devoted to data.

As indicated in Chapter 2.9, the market monitoring operated by indicator 1 will be extended to three leading international competitors of the EU in the data market: the US, China or Japan and Brazil. The approach will be similar to the one described above for the EU and is further detailed in Paragraph 2.9.1 of the present report.

2.1.3. Main data sources

As the data worker population is not measured directly by statistical institutions, IDC monitored existing statistics on employment to extract the elements which may be useful to design the profile of

18

Page 19: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

data workers and estimate their number, taking care of understanding the integrity and comparability of sources, the limitations of the sources identified and how to complement gaps, while running a preliminary assessment on feasibility across key dimensions such as country, sector and company size based.

Clearly, the starting point for the quantification of the data worker population is the identification of the underlying population, i.e. the total number of workers employed by country, size, and vertical market and like-to-like number of companies to calculate the intensity share (i.e. data worker intensity per company).

In order to identify such population, IDC looked into:

The EU Labor Force Survey (EU LFS), which provides data on employment (resident concept) of persons aged 15 and over by European country and vertical market according to NACE Rev. 2. No indication of company size is available.

Auxiliary indicators to National Accounts by branch which provide employment (domestic concept) by European country and vertical market according to NACE Rev. 2. No indication of company size is available.

Eurostat's Structural Business Statistics (SBS), which describes the structure and performance of businesses across European countries in terms of number of companies, employed, and revenues for industry, construction, trade and services. Data available by company size.

Eurostat's Business Demography statistics, part of Structural Business Statistics, which present data on the number of active enterprise in time t and employment in the same population of enterprises by NACE Rev 2. Data available by company size with a special focus on smaller sized sizes (below 10 employees).

Structural business statistics, however, cover only the market economy and do not include financial services, education, healthcare, and government. For those segments that are not tracked in structural business statistics IDC will leverage other specific sources to estimate missing data and provide a harmonized view of employment and number of companies/institutions. The sources to be leveraged include:

the European Central Bank for structural indicators on European monetary credit institutions Insurance Europe for facts and figures related to the European insurance market Education and health statistics published by the national Ministries of Education and Health,

as well as data on healthcare resources from the World Health Organization Data sourced directly from National Statistics Offices, including where available Public

Administration Censuses

Having identified the sources that found the model, total employment and number of companies by country, sector, and company size, the second step was to assess the availability of datasets which will allow categorizing the workforce according to relevant occupations based on the ISCO-08 classification.

Two main datasets provide such information. In particular:

International Labour Organization's ILOSTAT Database, publicly available, provides for all countries in scope the total employed population by economic activity and occupation (major group). No indication on company sizes.

Eurostat provides on demand data extractions from the EU Labour Force Survey, including employment by country (and EU 28) by economic activity and ISCO-08 (minor group level, i.e. three digits). Clearly, the higher the level of detail requested the higher the likelihood that Eurostat will cap figures both due to data reliability issues and to confidentiality reasons. Even so, these datasets provide valuable inputs to the model. No indication on company sizes.

19

Page 20: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

In those cases in which datasets prove to be insufficient to form solid assumptions, IDC will leverage and calibrate assumptions based on internal data and information by occupation and by vertical market in order to formulate assumptions and to complete with estimates the missing values of the datasets.

Finally, to assess country-specific maturity of the data market and the penetration of data workers in the knowledge economy (capturing therefore data workers not engaged in routine tasks) IDC will leverage internal resources as well. In particular, the following key sources are used:

IDC's Worldwide Semiannual Software Market Forecaster, which monitors on a biannual basis historical software market vendors' revenues as well as providing 5-year forecasts of supply side revenues for 79 software functional areas. The relevance of this data is that it provides an indication on end-user maturity of all those software areas strictly connected to enabling the operations of the data workforce. In particular, looking at penetration rates of data access, analysis, and delivery products, i.e. end user–oriented tools for ad hoc data access, analysis, and reporting as well as production reporting most commonly used by information consumers or power users rather than by professional programmers, allows identifying country-level maturity status of the data workforce.

Even more strictly connected to the data market place and the data workforce is the data monitored and forecast on a biannual basis in IDC's Worldwide Semiannual Business Analytics Software Tracker, which is specifically addressed to the software which defines the business analytics competitive markets.

IDC will also leverage knowledge based on end-user surveys (for example, IDC's European Vertical Markets Survey, IDC's European Software Survey, IDC's European Services Survey, and IDC's Big Data and Analytics Maturity Benchmark Survey) build industry and country specific correction factors and to validate results.

Tables 3 and 4, provide a summary of the key datasets that will be used to quantify indicator 1, providing also the indication on country coverage and vertical/company size dimensions available.

20

Page 21: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 3 Key External Datasets and Sources Leveraged for the estimation of the total number of data workers

Data Owner Domain Title Country coverage Coverage

EurostatStructural Business Statistics

Annual enterprise statistics by size class for special aggregates of activities (NACE Rev. 2) [sbs_sc_sca_r2]

All EU28 countries (excl. Malta)

Section/Division; Sectors covered expressed in NACE Divisions: B-J; L-M; S95. Sizes: 0-9; 10-49; 50-249; 250+

Eurostat Science and technology

Employment in technology and knowledge-intensive sectors at the national level, by sex (from 2008 onwards, NACE Rev. 2) [htec_emp_nat2]

All EU28 countriesSection; Sectors covered based on Division Level: A-S; Aggregations for High-technology manufacturing and knowledge-intensive high-technology services

Eurostat Science and technology

Employment in technology and knowledge-intensive sectors at the national level, by type of occupation (from 2008 onwards, NACE Rev. 2) [htec_emp_nisco2]

All EU28 countries

Section; Sectors covered based on Division Level: A-S; Aggregations for High-technology manufacturing and knowledge-intensive high-technology services; ISCO: Top Level (i.e. Professionals; Technicians and associate professionals; Other)

Eurostat Economy and finance

National accounts aggregates and employment by branch (NACE Rev. 2) (nama_nace2) All EU28 countries

Depending on branch level selected (10 branches corresponds to NACE Rev 2 section level, while 64 branches reports data at division level)

Eurostat

EU Labour Force Survey (on households)

Employment by economic activity and occupation All EU28 countries All sectors; ISCO-08 - Minor Groups

EurostatStructural Business Statistics

Business Demography Statistics (active population of enterprises; birth; survival, and death.

All EU28 (excl. Greece and Croatia)

Section/Division/Groups; Sectors covered expressed in NACE Divisions: B-S; Sizes: 0; 1-4; 5-9; and 10+ companies.

Insurance Europe

European Insurance in Figures

Statistics N°48: European Insurance in Figures dataset (2012 data)

All EU28 (excl. Lithuania)

Number of insurance companies, employees, and gross written premiums

European Central Bank

Structural Indicators for the EU

EU structural financial indicators All EU28 (excl. Croatia)

Local branches, Employees of domestic credit institutions

21

Page 22: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Data Owner Domain Title Country coverage Coverage

Banking Sector

ILO Labour Force Employment by sex and economic activity All EU28 countries Section; Sectors covered expressed in NACE Divisions: A-U

ILO Labour Force Employment distribution by economic activity and occupation All EU28 countries All sectors; ISCO 08 - Major Groups

Table 4 Key IDC Datasets and Sources Leveraged for the estimation of the total number of data workers

Data Owner Domain Title Country coverage Coverage

IDC Software Worldwide Semiannual Software Market Forecaster

AT, BE, CZ, DK, FI, FR, DE, GR, HU, IE, IT, NL, PL, PT, RO, ES, SE, UK, Rest of CEE

-

IDC Software Worldwide Semiannual Business Analytics Software Tracker

AT, BE, CZ, DK, FI, FR, DE, GR, HU, IE, IT, NL, PL, PT, RO, ES, SE, UK, Rest of CEE

-

IDCBusiness Analytics & Big Data

Big Data and Analytics Maturity Benchmark Survey -

Telecommunications Services Provider, Commercial/ Retail/ Investment Banks, Government, Non-Food Retail, Oil & Gas, Acute Care Hospitals, Food and Beverage Manufacturer or Private Label Grocer

IDCVertical markets and SMEs

European Vertical Markets Survey UK, FR, DE, ES, IT All verticals, including details by sub vertical; Sizes: 10 + data can be aggregated in 10-249; 250+

22

Page 23: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Data Owner Domain Title Country coverage Coverage

IDC Software European Software Survey FR, DE, IT, NL, the Nordics, ES, UK

Financial Services, Manufacturing, Other Services Industries, Public Sector, Retail/Wholesale; Sizes: 50+ employees

IDC IT services European Services Survey UK, FR, DE, IT, ES, BE, NL, LU, Nordics

Manufacturing, Transport, Telco, Utilities, Retail/ Wholesale, Financial Services, Business services, Government, Education, Health; Sizes: 250+ employees

23

Page 24: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.1.4. Gap analysis

Availability of dataQuality and Reliability of data

Feasibility of indicator

N.Indicator Name

Type of data

Public sources

IDC & other private sources

1.1

1.2

Number of data workers, Employment share

Statistics on occupation and ICT intensity by industry

M M H

EU = High

By Geography: EU: Medium to

low; EFTA: Medium CH, NO, IS

By industry = High at EU level

By company size = none

1.3

Intensity share (only on private sector)

Statistics on occupation and on number of enterprises

M M H

EU = High

By Geography: EU: Medium to

low; EFTA: Medium CH, NO, IS

By industry = High at EU level

By company size = none

Legend: High – Medium – Low or none

Data are mainly available but not complete for all the countries; nevertheless, the available data are official statistics and quite reliable.

The estimation procedure is feasible at EU level while we may have some issue to provide data by countries because of the missing data.

2.1.5. Measurement approach

6.

7.

7.3.

7.4.

7.5.

7.5.1.

The previous discussion on data issues shows that the number of data workers will need to be estimated and cannot be directly calculated with the available statistical data.

ILOSTAT provides information on employment by country and occupation according to the ISCO classification but at a quite aggregate level. More detailed Eurostat datasets are available upon request. These data will be used as a starting point to build a model: after reviewing the different possible approaches, and considering the availability of data and their reliance, we designed an estimation approach, which is presented in this paragraph.

24

Page 25: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

The estimation approach is based on an iterative process which starts from the available data and from our own knowledge of the ICT industry and the knowledge economy. Our approach is based on our knowledge of the ICT industry and of its main technology and industry trends.

The overall approach is structured on:

A set of assumptions An iterative process A validation process to refine the estimation, based on the calibration of the estimation

2.1.5.1 Assumptions Data workers may be employed in most industries of the economic system, which may be

users or suppliers of data, As a first approximation, the number and share of data workers depends on occupational mix

and does not depend on industry specific characteristics or country specific characteristics; therefore differences between countries or between industries are the result only of different occupational mixes of the various industries and countries,

There is a positive correlation between the number of data workers and the Business Analytics software revenues,

Data workers are positively correlated with the software revenues, Data workers may be employed in both SMEs and L companies; during the early stage of the

adoption curve, the use of data and the diffusion of data workers is higher into the large companies; as the adoption curve progress, SMEs increase their use and production of data products and services, and the number of data workers increases,

The number of data workers does not relate to country-specific aspects, except after controlling for occupation-specific and industry mix composition (e.g. related to firm size composition) (country specificities and industry mix composition are not the main factors explaining the occupation, they only explain residual differentials).

2.1.5.2 Estimation procedureThe table below presents the main steps of the estimation procedure. It is an iterative process, which will require various calibration steps.

2.1.5.3 Outputs estimated data workers estimated employment share estimated intensity share (data workers per companies)

The below table presents the procedure designed to estimate the data workers. The procedure will be fully based on data collected from both official statistics and from private research. This decision is based on the fact that starting data are available and sufficiently reliable to support an iterative estimation approach.

As explained it is an iterative procedure where the calibration process will be important to fine-tune the first estimation. The calibration process will be based on the following tools:

calculation of relevant correlations to revise the scattergram plot and the trend line identification of residual differential depending on industry mix composition and on country

specificities

25

Page 26: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 5 Procedure for the estimation of the total number of data workers

Procedure for the estimation

Number of data workers - Absolute number

Data inputs Calculation procedure Segmentation Output / Indicator

ISCO occupation codes where data workers may be present, 2014

Selection of the ISCO occupations where data workers may be present

Codes at 4 digit disaggregation

This is the perimeter where data workers are classified

ILO, occupation data for the total of the economic activities ; Eurostat ad hoc basis: employed in 28 countries

To be crossed with 21 ISCO minor groups (3 digits; data at 4 digit are not available)

By countries and EU total, sections of the NACE classification of economic activities

This is the quantitative perimeter where we have to detect the data workers

Matrix 21 selected ISCO-08 crossed with 11 sections of NACE economic activities for 28 countries

We need to make ad hoc assumptions to fix the % of each ISCO code that we estimate to be data workers

By countries and EU total, sections of the NACE classification of economic activities

These are the quantitative shares of data workers in different groupings of selected ISCO-08 codes

We assume there are 3/5 possible % values of data workers in each ISCO minor group (3 digit)

We regroup selected ISCO-08 codes into 3 to 5 occupational clusters with differing data worker intensity

These are the homogenous occupational clusters

Validation by experts of both the % values and the code allocation

Estimation of data workers by countries and for each 21 minor group by applying pre-defined shares to homogeneous occupational clusters

By countries and EU total, sections of the NACE classification of economic activities

This is the matrix of data workers by NACE sections and countries and the derived vector of total data workers by countries

26

Page 27: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Procedure for the estimation

Number of data workers - Absolute number

Data inputs Calculation procedure Segmentation Output / Indicator

IDC index of IT maturity based on IT, on Business Analytics and on Big data

Calculation of correlation between IT maturity by countries and number of estimated data workers

By countries for total economic activities

Scattergram plot and trend line of the correlation, that we expect to be high

Process of calibration of estimated data workers, by looking at cases with high residual

Revised quantitative shares of data workers in ISCO codes and/or revised composition of clusters

27

Page 28: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

One of the most critical steps is the assignment of % values of data workers into each ISCO minor group and the grouping of ISCO codes into 3 to 5 occupational clusters with different data workers intensity. The assignment of the % values and the grouping will be validated with experts having a consolidated experience in the field of data workers.

Once the data workers are estimated, we can calculate the other two indicators, which are the employment share and the intensity share.

Table 6 Procedure for the estimation of employment share and intensity share

Data inputs Calculation procedure Segmentation Output / Indicator

Employment share - % of data workers on total employment

Our estimates of data workers by countries and by industry, EUROSTAT total employment by countries and industry

% of data workers on total employment,

By countries and by industry Employment share of data workers

Intensity share - Average number of data workers per company, i.e. ratio total n. of data workers on n. of companies

Our estimates of data workers by countries and Eurostat number of companies by industries

ratio between data workers and number of companies

By countries and by industries Intensity share of data workers

2.1.5.4 Sanity checkIn order to assess our estimates, we will cross them with the following data and information:

Available estimates of big data workers (Mc Kinsey, 2011; IBM) Comparison with occupational trends, employment share and intensity share in the ICT sector

2.1.6. Qualitative interviews

The measurement approach designed above requires several assumptions and decisions about the best way to estimate the indicator. Rather than carrying out interviews with statistical experts in the indicator design phase, as initially planned, we believe that it will be more useful to carry out these interviews in the next phase when attempting to measure the indicators, so as to validate our assumptions and our measurements.

We have already selected and contacted these key experts, with whom we hope to have a fruitful collaboration during the measurement process (rather than a simple interview). We have found in fact that the monitoring of data workers is considered a critical issue by most experts in the field.

The experts identified are:

Vincenzo Spiezia, responsible for the Internet economy measurement unit of the OECD Economic Analysis and Statistics Division (EAS/STI), just back to the OECD after 3 years at ILO. Spiezia recently presented a position paper on the OECD research agenda for the near future on ICT, skills and jobs.

Kristian Reimsback-Kounatze, also from OECD, author of the paper “Exploring Data-Driven Innovation as a New Source of Growth”

Tobias Huesing, from empirica, main responsible of the e-skills supply-demand quantitative model developed in collaboration with IDC, source of the data on the e-skills gap frequently quoted by Commissioner Kroes.

28

Page 29: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.2. Indicators 2 and 3: Number and Revenues of Data Companies

Paragraph 2.1 presented the foundation of the analysis approach. As explained, the approach is a micro-economic one based the analysis of the market building blocks, which means analyzing the supply and demand of this emerging market. It is therefore important to distinguish and define suppliers and users in order to better explore the value of the market and the impacts this new industry may have on the economy. For a complete and exhaustive analysis, it is important to analyze both supply and demand.

In the first version of this Methodology we had focused the Indicators 2 and 3 mainly on the supply side analysis, based on the identification of data companies through an ad-hoc survey focused on selected sectors. After in-depth discussion with the EC we have decided that for the quality and depth of the study it is important to collect information also on data users; therefore we are planning an ad-hoc survey covering all economy sectors, which should enable us to produce two indicators, one on the number of data companies (supply-side) and one on the number of data users (demand side). The methodology approach and the trade-offs of this choice are explained below.

2.2.1. Definition and statistical reference

For the sake of this study we have classified data-related companies in two main groups as follows:

Data companies are data suppliers, meaning that their main activity is the production and delivery of data-related products, services and technologies. These companies constitute the emerging data industry.

Data users are organisations with high intensity of reliance on data for the accomplishment of their mission: this means that they generate and exploit their own data, collect online customer data intensively, subject this data to sophisticated analyses (such as controlled trials and data and text mining), and use what they learn to improve their business. We will use the survey to operationalize and specify this definition.

2.2.1.1 Data CompaniesData companies may be start-ups, innovative SMEs, spin-offs of larger enterprises. Most of them originate from, or are currently classified within the ICT industry, because the core technology they use is Big Data technology. Traditional information services companies (for example publishers of online directories, credit information, and market research companies) are classified as data companies if they develop and deliver the innovative data-related products and services identified by the study, using advanced data technologies. However, we do not consider the media and publishing sector as a whole as part of the data supply industry, since their main activity is focused on communication and entertainment and is not primarily focused on the production and exploitation of data.

This definition will be operationalised based on IDC’s taxonomy of Big Data and Business Intelligence technologies and services. It should be clear that since this is an innovative, emerging industry the definition must be flexible and open to adjustments through the ad-hoc survey.

More specifically, as presented in our Taxonomy 3.0 (annexed) the following Figure shows the classification of the main categories of data market suppliers who will be analysed in the study, expanding the categories represented in the Data Value Chain Figure. As shown in the Figure below, most of the data companies originate from the ICT sector, but not only.

29

Page 30: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

The main groups we have identified are the following:

Integrated Suppliers including:o Traditional information services (databases, market research organizations) en

route to adopt the most recent big data technologies and update their offering;o Vertically integrated suppliers, large organizations who leverage their own data to

create a new business in the provision of specific data-related business (telecom operators, utilities, financial services), or the OTT (over the top) global players such as Google, Facebook, Microsoft or IBM.

For the sake of this study, for these categories of suppliers we will take into account only the activities, revenues and employees related with the provision of data-based services.

New/ specialised intermediaries are organizations whose core business is to develop and sell tools, products and/or services based on the re-use of data (including storage, aggregation, analysis) to other organizations. They can be cross-sector or specialised in specific vertical markets. They can be classified as follows:

o Providers of data marketplaces and data platformso Providers of data analytics products and serviceso Providers of vertical solutions / mobile apps/ cloud apps / big data apps

ICT enablers including:o Providers of Software and Toolso Providers of business & IT services

ICT infrastructure providers including: o Cloud Computing Providerso Providers of platforms & IT Infrastructureo Connectivity Infrastructure providers

Figure 3 Classification of Data Companies

Connectivity Infrastructure

Platform & IT Infrastructure

Tools and Technologies

Data Marketplaces, Data Platforms, Data Brokers

Analytics

Vertical Solutions /Mobile Apps/Cloud Apps/ Big Data Apps

Cloud C

omputing

Business &

IT services

ICT enablers and

infrastructures N

ew

Intermediaries

Integrated suppliers

Source: IDC 2014

30

Page 31: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.2.1.2 Data usersThe demand side of the data market is represented potentially by all enterprises, since every organization uses data. There are some industries which are notably intensive users of data. Finance, healthcare, retail (particularly e-commerce) are industries where data have a relevant and strategic role in the decision making process; most companies have important datasets, use them and in some cases they sell/exchange them. In some cases they don’t need the contribution of specialised intermediaries or ICT enablers because they have all the necessary resources to implement data products and services on their own.

Our indicator should identify the companies with an intensive use of the new data technologies. According to a recent UK study “The rise of the datavores”, about 18 per cent of the firms with active online operations can be considered “datavores” because they gather online customer data intensively, subject this data to sophisticated analyses and use what they learn to improve their business. They also report that they are more innovative than their competitors, in products as well as processes. There is much potential of increase, since over 40% of the firms surveyed are only casual data users.

2.2.1.3 When data users become data suppliersThe boundaries between demand and supply are not so clear-cut, since the companies which develop a good capability of exploiting their own data may become in turn resellers of their own data to third parties, and this is specifically relevant for enterprises active in the B2C market who increasingly monitor their customers’ activities collecting data. This is, for example, the case of the UK retail giant Tesco, which according to media news, sells information on the spending habits of shoppers, including the 16million members of its Clubcard loyalty scheme, among protests by consumer organizations and concerns of data privacy. It is also the case of telecom operator Telefonica with its Dynamic Insights division. However, it is clear that the revenues generated by the new data business are still marginal for these companies, even though we lack precise estimates. It is likely that as the data business grows, it may be devolved generating spin-off companies; more often, in order to develop the data business, B2C companies will rely on the services of specialised intermediaries (the data companies of our definition) and therefore will be indirectly monitored by our data industry analysis.

From the point of view of the measurement of our indicator therefore we have decided the following:

Traditional companies with a division or business unit dedicated to the development of data products and services are not data companies; if the division or business unit becomes a separate company (a spin-off) then it becomes a data company in its own right and is included in our definition of data industry.

At this stage, in the main B2C sectors such as finance, retail and durable consumer goods (automotive) there are still relevant barriers to the resale of customer data to third parties. In Europe the cases of such companies becoming data suppliers seem to be still limited. In addition, it would be misleading to include in the estimate of the data market the full revenues for example of a Tesco, only because they have a marginal activity of data reselling. These cases need further research before we can include them in our measurement of data companies.

On the other hand, ICT companies (particularly ISPs), who generate high volumes of data on the Internet, appear to be entering rapidly the data market, also by reason of affinity with the new data technologies. Therefore we will capture in our monitoring the ICT companies entering the data business (such as the telecom operators) and decide whether to include them in the accounting of data companies, depending on their level of activity and focus on the data market.

In summary, by carrying out a research on both the demand and supply side of the data market we should be able to identify the cases of companies operating in both sides.

31

Page 32: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.2.1.4 Statistical classification of the data companiesAs done for the data workers, it is necessary to define the category o the data companies from a statistical point of view. Data companies are not classified in official statistics so that we need to look where, into official statistics, data companies can be included.

According to our preliminary classification of data companies, they are likely to be concentrated mainly in two statistical sectors, Information and Communication and Professional, Scientific and Technical activities.

We present into the Annex the selection of the NACE rev2 codes of the economic activities where data companies can fall. For completeness, in the Annex, we have sub-divided the NACE code where data companies can be found on the supply side or where they can be found on the demand side.

The criteria used to define the perimeter of the data companies are the following:

We have included the NACE sections where specialised intermediaries and ICT enablers operate

In some NACE sections, although companies are not specialised intermediaries nor ICT enablers, it may happen that the companies (having a different core business), start new business units addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude these companies because they are currently a marginal part of the data industry.

We have excluded the companies collecting and implementing data products and services for their own use; we only consider as data suppliers the companies selling data products and services and therefore achieving revenues.

The sections of the NACE rev2 in which we can find data companies are:

Section J, which is Information and communication Section M, Professional, Scientific and Technical Activities

The codes selected for both Section J and Section M are presented in the two below tables.

Table 7 Selection of codes from Section J, NACE rev2, where data companies may be classified

SECTION J - INFORMATION AND COMMUNICATION

Division Group Class

58 Publishing activities Included

58.12 Publishing of directories and mailing lists Included

62 Computer programming, consultancy and related activities Included

62 Computer programming, consultancy and related activities Included

62.01 Computer programming activities Included

62.02 Computer consultancy activities Included

32

Page 33: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

SECTION J - INFORMATION AND COMMUNICATION

62.03 Computer facilities management activities Included

62.09 Other information technology and computer service activities Included

63 Information service activities Included

63.1 Data processing, hosting and related activities; web portals Included

63.11 Data processing, hosting and related activities Included

63.9 Other information service activities Included

63.99 Other information service activities n.e.c. Included

Table 8 Selection of codes, Section M NACE rev2, where data companies may be classified

SECTION M — PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIESDivision Group Class

70 Activities of head offices; management consultancy activities Included

70.2 Management consultancy activities Included

70.22 Business and other management consultancy activities Included

72 Scientific research and development Included

72.2 Research and experimental development on social sciences and humanities Included

72.2 Research and experimental development on social sciences and humanities Included

73 Advertising and market research Included

73.1 Advertising Included

73.2 Market research and public opinion polling Included

73.2 Market research and public opinion polling Included

74 Other professional, scientific and technical activities Included

33

Page 34: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

SECTION M — PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES

74.9 Other professional, scientific and technical activities n.e.c. Included

74.9 Other professional, scientific and technical activities n.e.c. Included

2.2.1.5 Statistical classification of the data user companiesAccording to our research hypotheses, every company or organisation is potentially a data user, so that all NACE codes should be logically included. We have aggregated the main sectors so as to be able to develop a realistic sample and analysis.

Table 9 Main industries and NACE codes where users may be classified

Industry segmentation NACE rev2

NACE section(s)

Mining, Manufacturing B - C

Electricity, gas and steam, water supply, sewerage and waste management D - E

Construction F

Transport and storage H

Information and communications J

Finance K

Public Administration And Defence; Compulsory Social Security O

Education P

Human health activities Q

Wholesale and retail trade repair of motor vehicles and motorcycles, Accommodation and food services G - I

Professional services, administrative and support services L-M-N

34

Page 35: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.2.2. Description of the indicator

Table 10 Indicator 2: Number of data companies

Indicator 2 - Description

N. Name Description Type and Time Segmentation

2.1Number of data companies

Total number of data companies in the EU, measured as legal entities based in one EU country

Absolute number, 2013-2014 est.

By Geography: 28 EU MS + total EU

By Industry: 2 NACE rev2 selected sections (Section J Information and Communication; section M Professional, scientific and technical activities)

By company size:

below 250 employeesabove 250 employees

2.2 Share of data companies

Total nb of data companies / total nb of companies in industry section J and section M

% 2013-2014 est. By Geography: 28 EU MSs + total EU

2.3 Number of data users

Total number of data users in the EU, measured as legal entities based in one EU country

Absolute number, 2013-2014 est.

By Geography: 28 EU MSs + total EU

By Industry: 11 NACE rev2 selected sections (see Table 9)

By company size: over/below 250 employees

As indicated in Chapter 2.9, the market monitoring operated by indicator 2 will be extended to three leading international competitors of the EU in the data market: the US, China or Japan and Brazil. The approach will be similar to the one described above for the EU and is further detailed in Paragraph 2.9.1 of the present report.

Table 11 Indicator 3: Revenues of data companies

Indicator 3 - Description

N. Name Description Type and Time Segmentation

3.1Total revenues of data companies

Total revenues generated by the companies specialized in the supply of data-related products and services

Billion €, 2013-2014 est.

By Geography: 28 EU MS + total EU

By company size:

below 250 employees

above 250 employees

35

Page 36: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Indicator 3 - Description

N. Name Description Type and Time Segmentation

3.2Share of data companies revenues

Ratio between data companies revenues and total companies revenues in the data perimeter

% of revenues on total,

2013-2014 est.

By Geography: 28 EU MS + total EU

By company size, if possible:

below 250 employeesabove 250 employees

3.3 ProductivityAverage revenues per employee of data companies

Million €

2013-2014 est.

By Geography: 28 EU MS + total EU

By company size:

below 250 employeesabove 250 employees

As indicated in Chapter 2.9, the market monitoring operated by indicator 3 will be extended to three leading international competitors of the EU in the data market: the US, China or Japan and Brazil. The approach will be similar to the one described above for the EU and is further detailed in Paragraph 2.9.1 of the present report.

2.2.3. Main data sources

The key inputs for indicators 2 and 3 follow closely those identified in section 2.2.3 for the data worker population:

Structural Business Statistics to identify the total number of companies and the revenues generated by country, size, and vertical market. Considering the higher NACE granularity needed to define the perimeter in which data-supply companies are found data is sourced directly from the services-specific section of the SBS database which provides data at NACE Rev. 2 group level (three digits). Some level of estimation will therefore be needed to consider class level data (see 2.3.3).

Internal IDC sources on supply-side revenues related to players in the data market, including databases on software vendors (worldwide, country level, by type of product/application) and on Big Data and Cloud Computing. The database classifies software vendors by several functional markets across applications, application development and deployment and system infrastructure, including also data analytics and Big Data software vendors. The database collects revenue of most of data analytics software vendors at a worldwide and country level.

One important component of the data-supply indicator is to identify start-ups. Venture capital investments prove to be an important indicator to factor in. Eurostat publishes data collected by the European Private Equity and Venture Capital Association (EVCA) survey of all private equity and venture capital companies. Data is presented by stage (Seed stage, Start-up stage, later stage venture). Data covers EU15 Member States, Bulgaria, Czech Republic, Hungary, Poland and Romania Norway and Switzerland with no indication on sector.

Moreover, IDC looked at information on start-ups identifying useful studies which can help put in context the most recent trends:

36

Page 37: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Statistical Compendium, EBAN 2014, an annual survey based research publication on the activity of business angels and business angel networks.

EC-BIC Observatory 2013 and the last-3-year trends publication from the EBN Innovation Network, which provides an overview of key facts and figures of the innovation-based industry in Europe (2010-2012 in the case of the latest observatory)

National level studies when available, such as Startups in Italy Facts and Trends (source: MTB, CrESIT)

The analysis of available sources clarifies that there are too many missing elements to be able to complete the quantification of data-supply companies and start-ups without further inputs to be collected via ad hoc surveys addressed to supply side companies and accelerators-incubators across Europe.

37

Page 38: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 12 Key External Datasets and Sources for the estimation of the data-supply companies and revenues

Data Owner Domain Title Country coverage Coverage

EVCA published on Eurostat

Private Equity/ Venture Capita

Venture Capital Investment by detailed stage of development (from 2007, source: EVCA)

EU15 Member States , BG, CZ, HU, PL, RO, NO, and Switzerland -

EBAN Start-ups/Spin-Offs Statistical Compendium, EBAN 2014

UK, AT, BE, BG, HR, CY, DK, EE, FI, FR, DE, EL, IE, IT, LT, LU, NL,NO, PL, PT, Russia, RS, ES, SE, Switzerland, TR

ICT, Biotech & Life sciences, Mobile, Manufacturing, Healthcare/Medtech, Energy, Environment and Cleantech, Retail and Distribution, Logistics and Transport, Creative Industries, Finance and Business Services, Impact Investing, Other

MIND THE BRIDGE Start-ups/Spin-Offs Startups in Italy Facts and

Trends, 2012 ITClean Tech, Life Science, Consumer Products, Web-based, ICT, Electronics, Machinery, Other Industry Sector

EC-BIC Start-ups/Spin-Offs EC-BIC Observatory 2013 and the last-3-year trends PT, FI, UK, DE, FR, ES, BE  -

38

Page 39: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Selected qualitative interviews with key data industry representatives are foreseen:

Juan Mateos Garcia, author of the Datavores study Michail Skaliotis, responsible for the Big Data task force at Eurostat

2.2.4. Gap analysis

Availability of dataQuality and

Reliability of dataFeasibility of indicator

N. Indicator Name

Type of data

Public sources

IDC & other private sources

2.1 Number of

data

companies

Statistics

on

enterprise

s and ICT

market

Low MediumMedium to Low

(incomplete coverage)

Feasible only through ad-hoc

survey

Geography = EU28

Company size = >250<

3.1 Total

revenues of

data

companies

Statistics

on

revenues

by sector

and by

company

size

Low Low Low

Feasible only through ad-hoc

survey

Geography = EU28 if possible

Company size = >250< if

possible

3.2 Share of

data

companies

revenues on

total

Statistics

on

revenues

by sector

and by

company

size

Low Low Low

Feasible only through ad-hoc

survey

Geography = EU28 if possible

Company size = >250< if

possible

3.3

Productivity Low Low Low

Feasible only through ad-hoc

survey

Geography = EU28 if possible

Company size = >250< if

possible

Legend: High – Medium – Low or none

The data about companies are characterized by a low level of availability and reliability mainly because of incomplete coverage. The data collection will need to be completed with a survey in order to be able to achieve estimates.

39

Page 40: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.2.5. Field research surveys

The gap analysis presented in the above paragraph shows that necessary data are extremely scarce so that the estimation procedure cannot be based on available data.

Specifically, we do not precisely know who the companies are and what their behaviour is. Therefore, in order to compensate the lack of data we need to collect primary fresh data through field research effort. We identify two alternative approaches:

Approach A: Survey of data companies (data suppliers) and survey of accelerators and incubators (start-ups); or:

Approach B: Survey of data companies (data suppliers) and data users.

Approach A will consist of two distinct surveys: each survey will focus on a very specific sector (data companies supplying data in the data market and belonging to NACE Rev.2 Sections J and M only; accelerators and incubators) and will be based on a relatively small sample of respondents.

Approach B will consist of one survey only: it will encompass both data companies (companies and organizations offering and making data at others’ disposal) and data users (companies and organizations actually using data), thus allowing to investigate both the supply-side and the demand-side of the data market. The sample of potential respondents of this survey will be larger and the representativeness higher, but no specific focus on start-ups will be included.

Approach A: Survey of data companies (data suppliers) and survey of accelerators and incubators (start-ups)

2.2.5.1 Survey of data companiesThe main objectives of this survey are to:

Gain a better understanding about who are the companies supplying data products and services

Collect data about their revenues and their employment Gain a better understanding about their performance and their customers

The survey will be based on a CAWI technique and will be finalised in early September in order to launch it by the end of September.

Selection of the survey sample

The budget and time constraints of the tender do not allow to carry out a field research in all of the 28 Member States. As a consequence, we are going to select a sample of countries representative of the EU market. In order to select such a sample of countries, we suggest using parameters which we can consider enabling conditions for the production and use of data products and services. The following table shows a selection of indicators which can be used to select the most representative countries and clustering them on the basis of their similarity.

40

Page 41: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

8. Table 13 Indicators of selection of MS for the survey

Type of indicator Indicators Indicative sources

General

Population size Eurostat

GDP growth rate Eurostat

Geographic area (North, South, East, West) Eurostat

ICT readiness and diffusion

IT spending on GDP (%) IDC

Fixed and Mobile Internet penetration IDC

Broadband connections penetration (incl. Urban/rural) IDC, DAE

Market concentration (leading operators market share) NRAs-IDC

Culture and Technology

% of population having never used the Internet Eurostat, DAE

Digital literacy (competence) indicators Eurostat, DAE

Based on IDC experience, a sample of 9 Member States will be sufficient to represent the EU market. An indicative

Such parameters would lead to a sample of 9 countries to be tracked into the survey. Such countries are:

1. UK

2. Sweden

3. Czech Republic

4. France

5. Germany

6. Hungary

7. Spain

8. Poland

9. Italy

The selected countries account for some 79% of EU28 GDP, represent a good North-South, East-West balance, and provide a good balance also for different levels of IT sophistication.

The EFTA countries are excluded from the survey.

The final country selection will be discussed and validated with the Commission in the first phase of the study.

41

Page 42: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

The sample will consist of approximately 1000 organisations. In each country the sample will be representative of the selected sectors, with soft quotas by company size.

Enterprise sizes are based on the number of personnel employed and are aggregated in two segments: below 250 employees and above 250 employees. Enterprises with less than 250 employees have also to be independent companies, not local subsidiaries of large multinational ones.

The survey will gather valuable data to estimate the companies indicator and the revenues indicator. It will also enable us gathering some relevant inputs to understand the demand and the customers faced by the sample of companies.

Finally, we will take the opportunity to ask for some forecast about the future performances and expectations in order to gather some inputs for the forecasts.

2.2.5.2 Survey on start-ups An accurate measurement of the number and revenues of the data companies in Europe needs to devote a special consideration to those companies, partnerships or temporary organizations that are generally newly created and that are presumably still in a development phase and in search of new markets. These newly created companies (Start-Ups) include a great number of entities focused on the production and or delivery of data-related products, services and technologies and therefore represent a fundamental facet of data companies as defined previously in 2.3.1.

The peculiarities of Start-Ups (i.e. their young “age”, their limited size in terms of the number of employees, their fluctuating revenues, etc...) further complicate the search for the necessary data that are required to measure indicators 1 and 2. As a result, IDC will conduct an additional survey – in parallel with the Business Survey described above – to obtain a representative picture of Start-Ups’ data products and services, their performances and their customers and, more importantly, collect data about their revenues and their employment in order to estimate the indicators considered in this section.

Design of the survey

Given the limited number of data statistics in this field, the Start-Up survey will be preceded by an intensive activity of desk research to identify the most appropriate sources of data and information about Start-Ups in Europe. The desk research phase will build on the main data sources already identified in section 2.3.3 above. In particular, IDC will carefully review the Statistical Compendium, 2014 authored by the European Trade Association for Business Angels, Seed Funds and other Early-Stage Market Players (EBAN), the EC-BIC Observatory 2013 published by the European and Business Innovation Centre Network (EBN), as well as other relevant secondary sources at national and European level. These sources will be further complemented by existing research conducted by IDC's EMEA Emerging Technology group, which provides market analysis on new technologies, with special interest in the start-up scene, and the R&D and M&A activities of multinational technology companies operating in Europe and worldwide.

Based on these initial secondary sources, IDC will carry out a systematic desk research to map out the landscape of Start-Ups’ data owners in Europe. This mapping exercise will include, but not be limited to, the following entities’ categories:

Business Angels; Business angels networks; Federations of business angels networks; Early stage venture capital funds; Business accelerators; Business incubators; Associates/other early stage market players; Universities and Scientific Parks.

42

Page 43: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

As an example – and in addition to the EBAN and EBN organizations already mentioned above – IDC’s mapping will comprise other relevant entities such as:

At European level:

BAE – Business Angels Europe, http://www.businessangelseurope.com/ The European Accelerators’ Assembly, http://www.acceleratorassembly.eu/ The European Investment Fund and the European Angels Fund, http://www.eif.org/ The European Private Equity & Venture Capital Association, http://www.evca.eu/

At national level:

I2 Business Angels, the Austrian Business Angel Network, www.awsg.at AE BAN, Associaciòn Espanola Business Angels, www.aeban.es APBA, Portuguese Business Angels Association, www.apba.pt BAN Nederland, Business Angels Networks, Netherlands, www.bannederland.nl BeBAN (Belgium), www.beagnels.eu France Angels, www.franceangels.org IBAN, Italian Business Angels Network Association www.iban.it US Business Angels, www.ukbusinessangelsassociation.org.uk UK Business Incubation (UKBI) The Dutch Incubator Association (DIA).

Expected Results

We expect the desk research outlined above to generate a long list of entities being in possession of relevant data and information on Start-Ups at national and European level across the EU28. We reasonably expect this long list to consist of 150-200 relevant entities, capturing therefore the most active and up to date organizations on the Start-Up scene today in Europe. This will not represent the total of the investigated universe but it is likely to encompass a large component of it.

Implementation of the survey

The long list generated by the desk research will be used to launch a questionnaire survey targeted at the accelerators and incubators hosting Start-Ups. The questionnaire will be directed at the general managers/ directors of the incubators/ accelerators and will aim at collecting data on:

The number of Start-Ups and Spin-Offs hosted by the structure The technology area covered, hopefully specifying the share represented by data-related

companies The main characteristics of these Start-Ups: size, age, market segment targeted, growth

perspectives, funding sources The dynamic of growth of start-ups by technology area, comparing data-related companies

with the other technology segments Expectations of development in the next years Barriers and drivers of growth of start-ups (specific to data one vs. those common to all)

This survey will be conducted in parallel with the other one described above. Similarly to the data companies Survey, the Start-Up Survey will be based on semi-structured interviews to be administered via CAWI methodologies. The structured questions will allow comparing results and building a common framework across the respondents. The Start-Up survey will be finalized in early September and is expected to be launched by the end of the same month.

The scope of the survey will not cover all of the EU 28 for cost and opportunity reasons. However, the suggested sample of countries should be larger than the data companies survey, covering 15 countries instead of 9 as the other survey. A rapid excursus of secondary sources such as the World Bank’s Ease of Doing Business Index, the Global Competitiveness Report published annually by the World Economic Forum or the Index of Economic Freedom published by the Heritage Foundation and

43

Page 44: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

the Wall street Journal suggest that countries like Austria, Denmark, Estonia, Finland Ireland and the Netherlands should be considered when examining the European Start-Up scene. Additional evidence from the EBAN’s Statistics Compendium 2014 outlines the evolution of angel investments by country over the past two years (2012-2013), as well as the reported angel activity by country compared with the country’s GDP (the ratio angel investment/GDP) confirm this.

The sample of the Start-Up survey will indicatively therefore cover 15 countries, as follows:

UK Italy

Sweden Austria

Czech Republic Denmark

France Estonia

Germany Finland

Hungary Ireland

Spain Netherlands.

Poland

The final country selection will be discussed and validated with the Commission before the actual launch of the survey. We anticipate that a sample of approximately 200-250 organisations across the 15 countries identified above, with a target of a minimum of 80-100 interviews.

The survey will gather valuable data to estimate the number and growth of Start-Ups in Europe, and possibly their revenues and jobs created. It will also provide a comprehensive picture of European Start-Ups’ data products and services, their performances and their customers adding qualitative value to the measurement of indicators.

The results of this survey will feed into the estimate of data companies, depending on the relevance and quality of the information collected.

As an alternative to Approach A outlined above, IDC proposes to conduct one single survey extended to both data companies and data users. This alternative survey is detailed in Approach B below.

Approach B: Survey of data companies (data suppliers) and data users

2.2.5.3 Survey of data companies and data usersIn the course of the First Interim Meeting between the EC and the Study Team, the need of further investigation on the demand-side of the data market emerged. IDC therefore considered the possibility of extending the field research surveys to companies and organizations actually using data (i.e. data users) as opposed to limiting the survey sample to companies simply offering and making data at others’ disposal (i.e.: data companies and start-ups).

The main objectives of the survey of data companies and data users’ survey would be to:

Gain a better understanding about who are the companies supplying data products and services, as well as the companies and organizations using and exploiting those data products and services;

Collect data about revenues of data companies and the employment of data companies and data users;

Gain a better understanding about performance and customers of both data companies and data users.

44

Page 45: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

In IDC’s view, this survey represents a viable compromise between the newly identified requirement to encompass the demand-side of the data market and the need to account for the time and budget constraints set out by the current study’s tender specifications.

Selection of the survey sample

As for the surveys in Approach A, these constraints would not allow to carry out a field research in all the 28 Member States. As a result, IDC will select a sample of countries representative of the EU market. The parameters used to select the most representative countries and cluster them on the basis of their similarities will be the same as those indicated in Approach A, namely:

Table 14 Indicators of selection of MS for the survey

Type of indicator Indicators Indicative sources

General

Population size Eurostat

GDP growth rate Eurostat

Geographic area (North, South, East, West) Eurostat

ICT readiness and diffusion

IT spending on GDP (%) IDC

Fixed and Mobile Internet penetration IDC

Broadband connections penetration (incl. Urban/rural) IDC, DAE

Market concentration (leading operators market share) NRAs-IDC

Culture and Technology

% of population having never used the Internet Eurostat, DAE

Digital literacy (competence) indicators Eurostat, DAE

Such parameters, coupled with the need to keep the new proposed survey within the limits of the time and budget constraints set out by the tender specifications, would lead to a sample of 8 countries to be tracked into the survey. IDC proposes the following countries:

1. UK

2. Sweden

3. Czech Republic

4. France

5. Germany

6. Spain

7. Poland

8. Italy

The selected countries would still account for the vast majority of EU28 GDP, represent a good North-South, East-West balance, and provide a good balance also for different levels of IT sophistication. The EFTA countries are excluded from the survey.

45

Page 46: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

As displayed in the table below, the total sample of the new survey will consist of 1,500 completed interviews, adding 500 interviews to the initial target number. The increased sample size will cover all the sectors indicated in Table 15 above.

46

Page 47: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 15 Proposed survey sample

Total Confidence Interval +/- %

Czech Republic

below 10 70 11.7%10-249 50 13.9%250+ 30 17.9%Total 150 8.0%

France

below 10 80 11.0%10-249 70 11.7%250+ 50 13.9%Total 200 6.9%

Germany

below 10 80 11.0%10-249 70 11.7%250+ 50 13.9%Total 200 6.9%

Italy

below 10 80 11.0%10-249 70 11.7%250+ 50 13.9%Total 200 6.9%

Poland

below 10 80 11.0%10-249 70 11.7%250+ 50 13.9%Total 200 6.9%

Spain

below 10 80 11.0%10-249 70 11.7%250+ 50 13.9%Total 200 6.9%

Sweden

below 10 70 11.7%10-249 50 13.9%250+ 30 17.9%Total 150 8.0%

UK

below 10 80 11.0%10-249 70 11.7%250+ 50 13.9%Total 200 6.9%Grand Total 1500

The proposed increased number of complete interviews will provide a statistically representative sample in each country, at country level. The degree of representativeness will be lower at company-size segment and industry segment level, in each country, where soft quotas and specific filtering questions at the beginning of the questionnaires, will be applied to mitigate the possible decrease of representativeness.

The survey will be based on a mix of CATI and CAWI technique and will be finalised in early September in order to launch it by the end of September 2014.

Enterprise sizes are based on the number of personnel employed and are aggregated in two segments: below 250 employees and above 250 employees. Enterprises with less than 250 employees have also to be independent companies, not local subsidiaries of large multinational ones.

47

Page 48: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

For the supply-side of the data market (data companies), the survey will gather valuable data to estimate the number of companies and their revenues; these data will be fed into the appropriate indicators. For the demand-side of the market (data users), the survey will provide data on the number of companies, without covering the revenues aspect. For both data companies and data users, the survey will gather relevant inputs to understand the demand and the customers faced by the sample of companies.

Finally, we will take the opportunity to ask for some forecast about the future performances and expectations in order to gather some inputs for the forecasts.

2.2.6. Measurement approach

2.2.6.1 Assumptions Data companies are defined as the supply side of the data market, i.e. the data industry; the

core of the suppliers of data products and services are classified in the sections J and M of the NACE rev 2,

Data companies can be both SMEs and large companies, Supply of data products and services may also be developed by start-ups and business units,

which were developed by users of data products and services. Data industry being an emerging industry, we can assume that this is only a minor part of the data industry and that most supply developed by the users is for internal use.

2.2.6.2 Estimation procedureThe results of the survey will be elaborated according to the following steps:

First, we will estimate the results for each surveyed MS, estimating the total number of data companies and their revenues, and of data user companies;

Second, we will cluster the 28 MS on the basis of their similarities in terms of socio-economic parameters, intensity of IT use and other indicators correlated with the actual and potential use of data products and services. We will make sure that in each cluster there are at least 2 surveyed MS.

Third, we will estimate the indicators for each MS, by extrapolating the results from the surveyed countries to the other countries in the same cluster, taking into account corrections due to population size.

Finally we will calculate the indicators for the total EU.

Given the complexity of this process, it will not be possible to apply it to the EFTA countries.

Based on past experience, we will likely aggregate the surveyed countries in order to create clusters. Such clusters will be decided at the moment, depending on the final selection of the surveyed countries.

This process will be facilitated by IDC databases on IT spending by solution areas, including business analytics and big data, and demographics information on number of companies, employees and GDP contribution. The IDC data on revenues will be used to check consistency and to calibrate results where necessary. The estimation procedure is outlined in the table below.

48

Page 49: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 16 Estimate process

Procedure for the estimation

Data inputs Calculation procedure Segmentation Output / Indicator

Number of data companies

Total number of companies representing the population of data companies, in section J and M of NACE rev2: Eurostat

By Geography: 28 MS + total EU / company size below and above 250 employees

Perimeter of the data companies (supply side)

Survey: % of companies which correspond to our definition of data company

Extrapolation of the sample result to the companies population (perimeter)

By Geography: 9 MSs surveyed + other countries if possible + total EU / companies size below and above 250 employees

Estimated number of data companies

Total revenues of data companies

Eurostat: turnover of companies referred to the data companies perimeter

By Geography: 28 countries ) + total EU / company size below and above 250 employees

Turnover of the data companies perimeter

Survey results: revenues of the data companies by countries and by company dimension

Extrapolation of the survey results to the companies' population (perimeter) for the 28 countries where the survey will be launched

By Geography: 9 MSs surveyed / companies size below and above 250 employees

Estimate of data revenues for the data companies population in the 9 countries surveyed

Survey + IDC maturity indexes based on general IT, Business Analytics and Big Data by countries

Extrapolation of the survey results to the countries which are not surveyed

Geography: 28 countries (if possible), company size above and below 250 employees

Estimate of total revenues for the data companies population in the 28 countries

49

Page 50: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Share of data companies revenues: ratio between data companies revenues and all EU companies of the data companies perimeter

Data inputs Calculation procedure Segmentation Output / Indicator

Estimated revenues for the data companies and revenues perimeter

ratio between the estimated revenues of the data companies and the revenues of the data companies perimeter

Geography: 28 countries + EU, company size above and below 250 employees

Share of revenues, i.e. ratio of the data companies in the data industry perimeter

Productivity

Total revenues of data companies, data workers as previously estimated

Ratio between revenues of data companies and data workers

Geography: 28 countries, company size above and below 250 employees Productivity

50

Page 51: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.2.6.3 Outputs Estimated number of data companies Estimated revenues of data companies Estimated productivity of data companies

2.2.6.4 Sanity Check Comparison with the corresponding data for the ICT industry and for the professional services

industry Comparison with corresponding data for the overall economy

2.3. Indicator 4.1: Size of the Data Market

2.3.1. Definition and statistical reference

The value of the data market corresponds to the value of demand, and does not correspond exactly to the aggregated revenues of the data companies. The market corresponds to the aggregate revenues plus the import and minus the export.

The challenge is the measurement of imports and exports both for digital products and services and for professional services.

The first problem is the identification of the codes of the Standard International Trade Classification (SITC) where data products may be identifiable. And then, we should estimate import and export of those products. For the professional and business services, we should refer to the Balance of Payments but again we don’t have sufficient inputs to estimate the data services transactions.

2.3.2. Description of the indicator

This indicator measures the size of the data market in terms of the overall value of the products and services exchanged. Since this indicator depends mainly on the quality and completeness of our field survey, it will probably be impossible to measure it by industry sector or by company size (SMEs vs MLE).

Table 17 Indicator 4: Size of the data market

Indicator 4 - Description

N. Name Description Type and Time Segmentation

4.1Value of the data market

Estimate of the overall value of the data market

Billion €, 2013-2014.By Geography: total EU; possibly by MS

As indicated in Chapter 2.9, the market monitoring operated by indicator 4.1 will be extended to three leading international competitors of the EU in the data market: the US, China or Japan and Brazil. The approach will be similar to the one described above for the EU and is further detailed in Paragraph 2.9.1 of the present report.

51

Page 52: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.3.3. Main data sources

Availability of data

Assessing the value of the data market and of the data economy will not be a simple function of aggregating available data from various sources. The main challenge will be to define the boundaries of the data economy and identify the typology of data which are useful to measure its value. The final assessment will be based on a model which will leverage a range of structured data and estimates.

Assessing the value of intangible goods like data is always a challenging task: some phenomena might be under-evaluated and are in general difficult to measure. Most of data value can be latent and derived from unknown secondary uses, so adding together only what is gained from its primary use doesn't describe its value to the EU economy.

For the exports we can take advantage from the survey where we will include a couple of questions in order to estimate the export of the surveyed companies and countries.

For the import, we will estimate its value based more on an educated guess process than on an estimation procedure. The estimate of import may be driven by:

the results of the exports trends the available average trends of the professional and digital services

These estimates of the imports will be validated with the experts involved in the study.

To assess the availability of data on exports and imports of good s and services IDC monitored:

OECD Balance of Payments (MEI) database that provides data by country on services and good exports and imports with no further level of disaggregation.

Eurostat Balance of Payments by country database provides higher granularity in terms of the services tracked, providing detail on export (credits) and imports (debits) for twelve services aggregations. Among these we can find Communication services and Computer and information services. These two segments allow restricting the perimeter to be considered as they deal with the key services that are tied to the data market economy. Moreover, further breakdowns could be available upon request or could be retrieved from the national banks or the national statistical offices of the Member States that supply the data to Eurostat.

Eurostat also provides high tech trade data presenting data according to SITC Rev.4. High-tech trade data are extracted from the COMEXT database - Eurostat's database of official statistics on EU external trade and trade between EU Member States. The database contains data on the import/export of goods of the EU Member States, Candidate Countries and EFTA.

52

Page 53: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 18 Key External Datasets and Sources for the estimation of the data economy market

Data Owner Domain Title Country

coverage Coverage

OECD

International Trade and Balance of Payments

Balance of Payments (MEI) All Exports and Imports for goods and

services.

Eurostat Economy and Finance

Balance of payments by country (bop_q_c)

All EU28; Iceland and Norway.

Goods, Communications services, Computer and information services, Construction services, Financial services, Government services, n.i.e. , Insurance services, Other business services, Personal, cultural and recreational services, Royalties and license fees, Services no allocated, Transportation, Travel.

Eurostat Science and Technology

High-tech trade by high-tech group of products in million euro (from 2007, SITC Rev. 4) [htec_trd_group4]

All EU28

By High Tech product (Aerospace, Computers-office machines, Electronics-telecommunications, Pharmacy, Scientific instruments, Electrical machinery, Chemistry, Non-electrical machinery, Armament)

2.3.4. Gap analysis

Availability of data Quality and Reliability of

data

Feasibility of indicator

N. Indicator Name

Type of data

Public sources

IDC & other private sources

4.1 Value of data market

Statistics on BP Low None Low

Low, feasible through survey

Data on import and export of intangible assets are not available and not complete. We will include a couple of questions in the survey in order to estimate exports, if possible.

53

Page 54: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.3.5. Measurement approach

2.3.5.1 Assumptions The total revenues of all data companies can be estimated by projecting the results of the

survey Exports are nearly all implemented by the data companies Imports are nearly all implemented by the users Import trends of data services are similar, in average, to other most advanced professional

and digital services

2.3.5.2 Estimation procedureTable 19 Estimation procedure of the data market

Procedure for the estimation

Data inputs Calculation procedure Segmentation Output / Indicator

Data companies revenues

Estimate based on the aggregated results of the survey extrapolated to the total EU

Geography: 8 countries of the survey + EU28

Total revenues of data companies in Europe

Export: from the survey Extrapolation to the sample of countries

Geography: 8 countries of the survey + EU28

Estimate of exports of data products and services

Import: no data available

Estimate based on the BP for the business services

Geography: 8 countries of the survey + EU28

Estimate of imports of data products and services

Total revenues+ Export-Imports

Geography: 8 countries of the survey + EU28

Market value

The first step for the estimate of the market value will be to aggregate the results of the survey on the total and average data companies revenues, extrapolated to the total EU according to the procedure described in the survey paragraph above (using country clusters). The estimate will be validated by cross-checking with IDC’s Big Data market estimates (which is a large subcomponent of the total market).

The next steps will be to estimate the exports and imports, as described above, which have to be respectively added and subtracted to the market value. It is possible that at this stage export-imports are not very relevant.

2.3.5.3 Outputs The data market value for the total EU28 and at least for the 8 surveyed MS. We may be able to estimate the individual market value for the other MS but without imports and exports.

54

Page 55: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.3.5.4 Sanity check Consistency with the other indicators, such as revenues, with the number of companies supplying data products and services by countries and the number of data user companies.

55

Page 56: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.4. Indicators 4.2-4.3: Value of the Data Economy

2.4.1. Definition and statistical reference

Impacts of the data products and services basically depend on:

The impact provided by the data supply-side in terms of contribution to GDP and in terms of employment

The impacts provided by the exploitation of data, i.e. impacts on the demand-side

The impacts provided by the exploitation of data over the economic system include (OECD, 2013, Mc Kinsey 2011)

optimizing production and delivery processes (data-driven processes) optimizing marketing by providing targeted advertisement enhancing research and development and developing new products and services innovating business models creating transparency and diffusion of information

Finally, we should also notice that the impacts from the data supply-side are immediate and measurable, while the impacts on the demand-side are more difficult to catch, especially in the early stage of an emerging industry.

In the early stages, the impacts on the total economy depend basically from:

the impacts of the supply-side from the increase of demand the indirect impacts related for example to re-investments of savings the induced impacts relating to the improvement of productivity are not immediate

2.4.2. Description of the indicator

The value of the data economy is an important indicator because it includes the direct, indirect and induced impacts of the data market on the economy. We have decided to measure it through two simple but relevant indicators: the contribution of the data economy to GDP in absolute value and in percentage. In the present stage these indicators may turn out to be relatively small. However, it is very important to measure them as a starting point to monitor the growth of the data economy and its potential development.

Table 20 Indicators 4.2 and 4.3

Indicators Value of the Data Economy

N. Name Description Type and Time Segmentation

4.2Value of the data economy

Value of the data market plus direct, indirect and induced impacts on the EU economy

2013-2014

Billion €

By geography: Total EU + EU 28

If possible EFTA countries (except Lichtenstein)

4.3Incidence of the data economy on GDP

Ratio between value of the data economy and

2013-2014

%

By geography: EU 28 + Total EU

If possible EFTA countries

56

Page 57: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

EU GDP (except Lichtenstein)

As indicated in Chapter 2.9, the market monitoring operated by indicators 4.2 and 4.3 will be extended to three leading international competitors of the EU in the data market: the US, China or Japan and Brazil. The approach will be similar to the one described above for the EU and is further detailed in Paragraph 2.9.1 of the present report.

2.4.3. Main data sources

The data sources are the same we used for the estimate of the revenues. Beside, we can count on private research about the impacts of emerging industries and on IDC research on ICT and its impact on macro-economy.

2.4.4. Gap analysis

Availability of data Quality and Reliability of data

Feasibility of indicator

N. Indicator Name

Type of data Public sources

IDC & other private sources

4.2/4.3

Value of the data economy

Statistics data consumption

None None None

The estimate of the value of the data economy will be based on estimates of the multipliers of the data products and services on the whole economy which depend on:

The multiplier effect of data products and services on innovation in the whole economy The multiplier effect of increased revenues by users

The multipliers will be estimated using other similar studies for emerging industries which are the earlier stage of adoption of innovation.

2.4.5. Measurement approach

2.4.5.1 Assumptions Data related products and services are potentially adopted by all companies (both SMEs and

large) in all industries The adoption rate of the data related products and services depends on the technological

maturity of companies and on the IT spending of the countries The adoption of data related products and services will provide both direct and indirect

impacts on industry and on the overall economy The adoption of data-related products and services may provide saving effects (where these

products substitute more traditional ones) which in turn may increase re-investments The adoption of data-related products and services may generate more effective decisions

and in turn customer satisfaction

57

Page 58: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.4.5.2 Estimation procedureThe positive direct impacts may create virtuous cycle of development. Since the data industry is still at a very early stage, these estimates should be considered as a best effort estimates to be fine-tuned as the adoption progress.

The estimate of the value of the data economy will be based on estimates of:

Direct impacts: savings in CRM investments, and other marketing initiatives, savings on internal resources,

Indirect impacts: improved decisions making, launch of innovations, creation of new businesses,

Cumulative impacts will be considered carefully since they may generate a multiplier effect. This is even more important if we consider that the use of data products and services are not going to have the same impacts on all the user industry.

IDC will cluster the industries which may be affected by a high, medium, and low multiplier effect in order to estimate the overall effect on the EU economy. Finance, retail, energy for example are industries where the impact of an intensive use of data may be high. Direct and indirect impacts and the possible multiplier effects are not going to occur within a year, but they may require at least a couple of years.

IDC will develop a detailed model based on other models calculating the economic impacts of IT pervasive innovations.

Impacts on economy will clearly depend on the diffusion rate, which in turn depends also on the general economic conditions of next years.

2.4.5.3 Outputs Estimated value of the data economy

2.5. Indicator 5: Data Workers Skills Gap

2.5.1. Definition and statistical references

This indicator is designed on the basis of the definition of the Taxonomy, reported at § 2.2.4 of this report, of data workers. Data workers skills are defined as the skills specifically needed by workers who collect, storage, manage and analyze data, as their primary activity.

This indicator aims at measuring whether there is any bottleneck due to a gap between demand and supply of data worker skills in EU, and how relevant is this gap.

The supply of data workers is equal to the data skills supply stock which includes individuals acting as data workers, plus unemployed data workers. On the demand side, we assume that in the short term, the demand is the sum of existing and open positions, i.e. the demand of data workers includes the data workers employed plus the unfilled vacancies.

The data workers were already estimated so that to estimate the skills gap we miss the following information and data:

A clear definition of the skills needed and an assessment of their potential supply The number of the unemployed data workers The number of unfilled vacancies

58

Page 59: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.5.2. Description of the indicator

The indicator will be calculated with a geographical segmentation only because of lack of data.

Table 21 Indicator 5: Data Workers Skills Gap

Indicator 5 - Description

N. Name Description Type and Time Segmentation

5.1Data Workers Skills Gap

Gap between demand and supply of data workers

Absolute number, 2013-2014 est.

By Geography: total EU and EU28

If possible, EFTA countries (except Lichtenstein)

As indicated in Chapter 2.9, the market monitoring operated by indicator 5 will be extended to three leading international competitors of the EU in the data market: the US, China or Japan and Brazil. The approach will be similar to the one described above for the EU and is further detailed in Paragraph 2.9.1 of the present report.

2.5.3. Main data sources

The main data sources are the same used for the estimation of the total data workers.

2.5.4. Gap analysis

Availability of data Quality and Reliability of

data

Feasibility of indicator

N. Indicator Name

Type of data

Public sources

IDC & other private sources

5.1

Data workers skills Gap

Statistics on occupation and ICT intensity by industry

M M H

EU = Medium

By Geography: EU: Medium

to low; EFTA: low

By industry = low

By company size = none

Availability and reliability of data for the skills is worse than it was for the data workers since we have to estimate the vacancies and the unemployed data workers and any additional information is available to do that.

In order to complete the desk research, we will implement a number of qualitative interviews with the following experts:

59

Page 60: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Paul Costelloe Director of Executive Education at European CIO Association. Paul is willing to help us to collect opinions from CIOs national associations about the profile of data workers.

Theos Evgeniou, Academic director, Insead eLab - responsible of Insead's research on Big Data analytics and businesses, recently launched a master course on the issue

Silvia Leal, Academic Director, IE Business School, Madrid and CIONET Committee Member, recently launched a master on business analytics and big data

Dr Michael Beigl, Dean of the Informatics Lab of the Karlsrhe Institute of technology, coordinator of the Smart Data Innovation Lab - leading German initiative to develop big data applications and services.

These interviews should help us gaining a better understanding of the skills specifically needed by data workers.

2.5.5. Measurement approach

2.5.5.1 Assumptions Unemployment is extremely low and we assume that it does not exceed a natural rate of 2-3% Vacancies in data companies are moderately higher than in data user companies because

they require skills slightly more advanced

2.5.5.2 Measurement approachThe main data still missing are vacancies. In order to estimate them, we have included a question in the survey addressed to the data suppliers.

Our objective is to estimate vacancies in both supply and user companies. Vacancies will be estimated lower in the user companies. This assumption will be validated with the experts.

2.5.5.3 OutputEstimate of gap between demand and supply of data workers.

2.6. Indicator 6: Citizen’s Reliance on the Data Market

2.6.1. Definition and statistical references

Measuring the level of citizens' reliance on data would provide a more complete picture of the importance and social benefits of the data economy to the EU. For this reason we suggest an indicator measuring the percentage of the population who rely on data-based products and services to make informed decisions. Before designing such an indicator we need to clarify its real goal and scope. Everybody looks for information before making any decision, but the focus of this indicator should be on the specific benefits deriving from innovative data-based products and services.

In fact, the EU Data Value chain strategy has selected as one of its key policy targets to “ increase citizens’ use of data for informed behavioral decisions”. This target derives from the results on research on behavioral economics and behavioral decision theory, focused on how to improve societal and individual decision making when managing risk or facing difficult choices. Not only behavioral economics had proven that emotions and other non-rational factors affect our behavior, even when we believe to be perfectly rational. There is also an interesting discussion on how policy makers can provide information to citizens in the right way to influence them to behave in their best interests (the famous “Nudge” theory of policy making of American economists Richard Thaler and Cass Sunstein).

60

Page 61: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

A well-known finding is that by presenting certain choices as “opt-out” instead of “opt-in” they are much more likely to be accepted by individuals.

Informed decisions are of primary relevance in several social fields, ranging from healthcare and wellbeing, to ageing, to safety (driving, practicing sports...), to raising children. No wonder that policy makers are making it a priority to understand more about behavioural sciences and information. These new challenges are well presented in the FDA communication strategy (Food and Drugs Administration, the regulatory watchdog of the US government): “A major challenge is adapting FDA’s communications to rapidly evolving technologies that are driving major shifts in how consumers choose to receive and share information. To facilitate the translation of science-based regulatory decisions and information into public health gains, FDA must strengthen social and behavioral sciences in the areas of understanding and reaching diverse audiences, ensuring audience comprehension, and evaluating the effectiveness of communications in changing behaviors related to the use of regulated products”.

The European Commission (for example DG SANCO) is also exploring this field of behavioural politics, for example concerning how to deal with information about the risks of using social media by children and teen-agers. Better understanding of the role of data in this field can contribute to improve policy making in many policy areas at the European and national level.

The question is how exactly we can define what is the use of data for informed behavioural decision; how we can identify the citizens doing so; and how we can measure them. Finally, we wish for the indicator to be able to measure progress towards improvement of informed decisions, so we would need some measure of effectiveness of the use of information and data, for example if citizens actually make better decisions if they have access to the right type of data.

As a starting point we suggest to measure this indicator as follows:

The citizen’s data market indicator is the share of the population with a high reliance on data products and services, linked with their main socio-demographic characteristics (age, gender, education, profession, income...) and possibly with their behaviour.

We can measure this indicator only by inference, by identifying the percentage of the population using online services with a high level of data and information content. We also wish to link this indicator with the users’ socio-demographic profiles and possibly correlate this with their behavior (for example the decision they make after accessing the information) in order to deepen the analysis and assess the potential benefits to be gained. Finally, we focus on citizens, not consumers: therefore we are interested in decision-making in the policy and social domain, relevant for individual and collective well-being and social interaction.

This indicator could include, for example, the citizens who look for information on the Internet to select a doctor or a hospital based on their performance and other patients’ opinions, or to select a school for their children or check the environmental and security conditions of a neighborhood where they want to move, or else consumers checking information about provenance on a product’s label before buying it. But while there are plenty of statistics about the usage of online services, we need to use a more sophisticated approach in order to select those with a high content of data.

The desk research carried out so far is not sufficient to finalize the design of the indicator. We need more and better information about the type of services and the profile of users. We suggest therefore the following approach:

To carry out the desk research on the main data sources about citizens’ lifestyles and choices about digital products and services, from digital literacy to frequency and attitude of use of existing digital apps and services which can be correlated with the relevant data products and services;

To leverage the research carried out in this study about the emerging data-related products and services and their potential benefits for consumers;

61

Page 62: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

To carry out a few qualitative interviews with opinion leaders in the field of behavioral policy, data-intensive policy making (which we have already started to analyse in our second story, deliverable D.3.2), and market research, to collect suggestions and insights on the scope and the approach of the indicator

To finalize the design of the indicator and finalize the choice of the most appropriate measurement method (which is indicatively anticipated in the following paragraph).

62

Page 63: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.6.2. Description of the indicator

The indicative description of the indicator is presented in the table below.

Table 22 Indicator 6: Citizens’ reliance on the data market

Indicator 6 - Description

N. Name Description Type and Time Segmentation

6.1Citizens’ data market

Share of the population with a high reliance on data products and services relevant for informed decision making

% 2013-2014 est.

By Geography: EU 28 + total EU

If possible, EEA-EFTA countries

By Socio-demographic parameter: gender, age, possibly level of education and/or income.

Indicator 6 is very experimental at this stage and it we are not planning to measure it in other countries outside the EU (please see Paragraph 2.9.1 for additional details).

2.6.3. Main data sources

There are no ready-made statistics about such an indicator. We have investigated the main public and IDC sources about the diffusion and use of online services.

The main sources that we have identified are:

Eurostat ICT households’ survey, including for example:o % of population using internet for ordering goods or services (1.1.1)o % of population obtaining info from public authorities web sites (1.1.8)o % of population using internet for finding info about goods and services (1.2.C.3)o % of population using internet for seeking health information (1.2.C.7)o % of population using internet for looking for info about education, training or course

offers (1.2.C.19)o % of population using internet for interaction and obtaining info from pub. auth.

(1.2.E.1)o % of population using internet for reading newspaper/news magazine and for

consulting wiki (1.2.C.9-16))

All these data are given for the 28 EU countries and can be filtered by age (16-74) and gender, as well as other more detailed parameters.

The Digital literacy indicators of the DAE scoreboard IDC’s Digital Media Market Model IDC’s Consumerscape survey, a periodical survey research carried out by IDC on digital

consumers’ choices across the world, mainly focused on the purchase behaviour of iPhones, iPads and laptops but also on their feelings about Internet connectivity.

Consumerscape segments consumers based on their values and behaviour in 6 different clusters, from technophobes to pioneers. Unfortunately this indicator is only available for the 6 largest countries in the EU, even though it is also available and comparable for 25 more countries across the world. However these data could be extrapolated to the rest of the EU, based on the clustering methodology for the data market outlined in the description of the Indicators 2 and 3 on the data companies.

63

Page 64: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Figure 4 Worldwide Consumer Market Segments, 2013

Source: IDC Consumerscape

2.6.4. Gap Analysis

Availability of data Quality and Reliability of

data

Feasibility of indicator

N. Indicator Name

Type of data

Public sources

IDC & other private sources

6.1Citizens’ data market

Data on use of online services, behavior, choices

M M M

By geography: Medium-High

By socio-demographic parameters: Medium

According to our assessment, the availability of data is of medium level, since there is plenty of data on consumer choices, but it is more difficult to focus on the specific aspects we are interested about. The integrity and comparability of data about consumers across Europe is better than for data on businesses.

The indicator can be measured by country and most likely by socio-demographic variables, such as gender, age, education and income level, commonly used to measure Internet-related behavior. It could be valuable to measure it by end user market (healthcare, or government services, for example) but this would probably require different segmentations from those used in the other indicators, creating confusion, and is also quite complex to do.

64

Page 65: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.6.5. Measurement approach

While the final decision for the data collection approach will depend on the finalization of the design of the indicator, the following decisions have been made:

We exclude to carry out a large original survey on citizens; there are already plenty of such surveys and in any case this would go beyond the available resources.

We will carry out 2-3 scoping interviews in the initial phase of WP2 to finalize the design of the indicator and the measurement approach.

The data collection will focus on the available data sources about the diffusion and usage of online services and on consumer/citizen behavior as indicated above.

We are considering the option to use innovative data collection methods on the Internet. For example we could use Google trends to compare the frequency of use of certain keywords in each MS as a proxy to measure users behavior and compare it across Europe. This approach depends on the development of an appropriate research hypothesis and the selection of the right keywords, which should also be translated in the local language in all the countries considered.

The measurement of the indicator will most likely be based on the development of a compound index, aggregating various existing indicators. This should allow the possibility to forecast the indicator, based on projections of online services usage.

2.7. Data collection: overview of field research

In our methodological approach we have already excluded the option of a large survey across Europe, mainly for practical reasons. However as discussed for each of the indicators presented in the previous paragraphs, we do plan to carry out selected direct interviews and targeted surveys to collect the missing evidence and data. The following table summarizes the field research to be carried out, already described in detail in the previous paragraphs for each indicator.

Table 23 Summary of Field research activities

Indicators Direct Interviews Target Type of Surveys

1. Number of data workers Selected interviews with statistical experts

None

2. Number of data companies and number of data user companies Survey of a representative sample of

enterprises from all sectors in 8 MS, to be extrapolated to the rest of the EU

3. Revenues of data companies

4. Data Market Value None

5. Data workers skills gap Selected interviews with key data industry representatives (maybe the same as for Indicator 2) and with universities/ higher education institutions/ skills experts

None

6. Citizens’ reliance on the data market

Selected interviews with opinion leaders on behavioral policy, data intensive policy making, consumer market research

None

65

Page 66: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.8. Forecasting Indicators

In our proposal we have promised to forecast the market indicators indicated above to 2015 and 2020. This will be done projecting different possible trajectories of market development under alternative macroeconomic and framework conditions, based on the development of 2-3 alternative scenarios. The main objective of this activity will be to investigate the range of development options of the market, the potential barriers, and the role of policies to remove bottlenecks and promote favorable conditions.

The process will be based on the following main steps:

1. Identification and selection of the main factors affecting the evolution of the emerging data market in the period 2015-2020. This corresponds to the main framework conditions of market development, as described in the EDM monitoring tool (see following chapter 3, par.3.3)

2. Development of key assumptions on the main trends to 2020, building on IDC quarterly forecast process and the results of the desk and field research carried out to measure the indicators;

3. Development of a baseline scenario and of alternative growth scenarios, based on the different combination of main trends and the evolution of main policies and framework conditions;

4. Forecast calculations projecting the indicators under the alternative scenarios;5. Communication of the scenarios results and feedback collection from the EC, the peer reviewers,

the stakeholder community;6. Revision and finalization of forecasts and scenarios.

2.8.1. Step 2: development of key assumptions

The key assumptions for the forecast scenarios will be developed from the following sources:

The qualitative interviews; The webinars and interactive discussions in the stakeholder community to crowdsource ideas; IDC’s key assumptions about the overall IT market and the Big Data market.

We expect to gain valuable inputs from the stakeholders on the ways to drive the uptake of data service and applications by EU end-user companies and on the policy measures needed to facilitate the creation of new companies in the European data market.

IDC endeavours to document the assumptions behind each of our forecasts. We have developed an internal tool called the IDC Assumption Builder. Key to the IDC Assumption Builder is the development of a mental model of the market being forecast, which includes assumptions about the economy, supply, the labour force, etc. The IDC Assumption Builder makes it easy for an analyst to document his or her assumptions in each sector of the mental model, determine those that actually drive the forecast, and highlight the assumptions that have the power to radically alter a forecast if expectations change.

The key assumptions will be classified by type based on the main typology of factor which may affect the forecast. For this specific study we will develop key assumptions for the following areas (which may be revised and expanded during the research):

Macroeconomic trends (GDP growth trends) ICT trends (including IT spending on GDP by country, absolute levels of IT spending; business

and consumer users adoption of ICT innovation, including big data but also cloud, IoT, Mobile devices and apps, Social technologies)

Policy and regulatory trends (EC and national development policies of the data market, Open data policies, PSI policies, Privacy and data protection issues, and so on)

R&D&I trends for data-related research and innovation, including availability of risk capital

66

Page 67: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Education and training trends for data skills (evolution of relevant skills requirements and of skills supply, availability of data talents and skills)

Interoperability and standard trends for the data market The development of the Digital Single Market, including the potential relevance and role of the

European Digital Service Infrastructure;

The key assumptions will be rated by level of impact and uncertainty on the evolution of the data market. The high-impact, low uncertainty factors will be the assumptions of all scenarios and the high impact, high uncertainty factors will be the differentiating factors of the scenarios.

2.8.2. Step 3: scenarios development

Predicting the future is impossible especially with high uncertainty, while exploring the likely or possible interactions between the main trends allows building alternative scenarios presenting the main paths opening in front of us. This in turns helps to evaluate possible actions, their consequences and the risks if no action is taken.

Within the context of this study, we are not so much exploring wildly different scenarios, rather our objective is to project the potential trajectories of the data market and analyze the potential bottlenecks or barriers which may constrain its development. The time horizon of our forecast is in fact the medium term (5 years ahead), which reduces the range of uncertainties affecting the socio-economic context. The major macroeconomic trends, such as GDP growth, productivity growth, and demographics, can be projected within a reasonably small range of variation (unfortunately for Europe, since low growth is currently the most likely scenario). “Wild card” innovations are always possible, but in reality even disruptive changes take time to develop and penetrate deeply into the socio-economic system, especially in Europe where there is a very strong conservative bias. For these reasons, we believe that it will be possible to develop a baseline scenario assembling the most likely developments, contrasted with alternative scenarios testing the potential impacts of the main uncertainties. One of the key uncertainties for example will be the speed of diffusion of big data tools and technologies in the user market, which may be constrained by demand or regulatory barriers.

2.8.3. Step 4: Forecast calculations

In this step we will need to use different calculation methods since our indicators are substantially different. In summary:

Indicators 1 to 4.1 measure the data market and industry: they will be forecast extrapolating the indicators values in the baseline year 2014, based on alternative potential market growth trajectories. IDC’s forecasts and estimates will be an important input to this model. We will not forecast the other sub-indicators (for example the ratio of data workers on total employment) because this would require forecasting the entire EU economy (such as the evolution of EU employment in all the categories considered). This would require a forecast model too complex for this project.

Indicators 4.2 to 4.4 measure the value of the data economy on GDP: they will be forecast leveraging the GDP and IT spending forecast estimates already developed by IDC to 2020 in recent projects on cloud computing impacts and the impact of globalization on e-skills in Europe, where we developed macroeconomic and ICT scenarios.

Indicators 5 (data workers skills gap) and indicator 6 (citizen’s data economy) will require the development of additional sets of assumptions because they focus on domains not covered by the previous indicators, specifically the supply of data skills for indicator 5 and forecasting consumers’ use of data services and products for indicator 6. They will be developed in coherence with the data market scenarios which represent the core of our forecasts.

67

Page 68: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 24 Forecast of Main Data Market indicators

Indicators Data Market Forecast 2015-2020 Methodology approach

1. Number of data workers

By Geography: 28 EU MS + total EU

By Industry: 11 sections

Development of key assumptions on market growth trends to 2020

Development of a baseline scenario and alternative market growth scenarios

Development of forecast model to calculate the indicators depending on the alternative market growth trajectories starting from baseline year 2014

2. Number of data companies

By Geography: 28 EU MSs + total EU

By company size:

below 250 employees above 250 employees

3. Revenues of data companies

By Geography: 28 EU MSs + total EU

By company size:

below 250 employees above 250 employees

4.1 Data Market Value By Geography: 28 EU MSs + total EU

We are not planning to forecast these indicators for international competitors of the EU.

Table 25 Forecast of Main Data Economy indicators

Indicators Data economy Value Forecast Output 2015-2020 Methodology approach

1.2.3.4.4.1.4.2. Value of the data

economy

By Geography: Total EU

If possible, EU 28

If possible, EEA-EFTA countries Data market impact model forecast under alternative data market and macroeconomic growth scenarios

4.3. Incidence of the data economy on GDP

By Geography: Total EU

If possible, EU28

If possible, EEA-EFTA countries

We are not planning to forecast these indicators for international competitors of the EU.

68

Page 69: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 26 Forecast of Indicators 5 and 6

Indicators Indicators Forecast Output 2015-2020 Methodology approach

5. Data workers skills gap

Gap between demand and supply of data workers, absolute number, total EU

Development of additional assumptions on data skills supply trajectories for the alternative scenarios

Projections of indicator from baseline year 2014 to 2020 under alternative scenarios

6. Citizens’ reliance on the data market

Share of the population with a high reliance on data products and services – total EU

If possible, split by age and gender

Development of additional assumptions on consumers orientation to data use for the alternative scenarios

Projections of indicator from baseline year 2014 to 2020 under alternative scenarios

We are not planning to forecast these indicators for international competitors of the EU.

2.8.4. Steps 5 and 6: Communication, validation and final revision

The scenarios and market forecasts will be circulated in draft to collect feedback from the EC and the stakeholders before the final validation.

2.9. Indicators for Worldwide Monitoring

The market monitoring should cover, with an approach similar to the EU one, a core group of indicators for three leading international competitors of the EU in the data market: the US, China or Japan, and Brazil. We will first verify the main availability of data in China and if this proves too low, we will monitor Japan instead. The main output will be a country report for each of these countries based on a standardized templates, clarifying the scope of comparison with the EU indicators.

In the following paragraphs we will outline which indicators we plan to measure in the selected countries and a first assessment of the availability and feasibility of data. At this stage we will present this by country rather than by single indicator. The main goal is comparability with the EU so if an indicator cannot be measured in the EU, it will not be monitored in the other countries. Naturally, other information on each country’s data market resulting from the desk research will be presented in the report.

2.9.1. Description of International Indicators

Indicator 1 – Number of Data Workers

Our goal will be to collect data on the number of data workers and their share compared to total employees in each country. We expect that a comparison of intensity of data workers per company will not be feasible so we plan to drop the indicator 1.3.

The definition of data workers should be the same as the one used in the EU, in alternative we will clarify what is the definition used in these other countries.

69

Page 70: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 27 – International Monitoring, Indicator 1

Indicator 1 - Description

N. Name Description Type and Time Segmentation

1.1Number of data workers

Total number of data workers

Absolute number,

2013-2014 est.

By Geography: total country

If possible, by Industry: comparable to the 11 sections used for EU according to NACE 2

1.2 Employment share

Total number of data workers compared with the total employment in the country

% of data workers on total employment,

2013-2014 est.

By Geography: total country

If possible, by Industry: comparable to the 11 sections used for EU according to NACE 2

Indicator 2 – Number of Data Companies

Our goal will be to collect data on the number of data supplier companies and data user companies, as defined above for the EU. We do not plan to measure the indicator on the share of data companies on total by sector, since the relative universe (the sectors examined) is likely to be different and not comparable to the EU.

Table 28 – International Monitoring, Indicator 2

Indicator 2 - Description

N. Name Description Type and Time Segmentation

2.1Nb. of data companies

Total number of data companies

Absolute number, 2013-2014 est.

By Geography: total country

If possible, by Industry: comparable to the 11 sections used for EU according to NACE 2

If possible, by company size: over/below 250 employees

Indicator 3 – Revenues of Data Companies

Our goal will be to collect data on the revenues of data supplier companies, as defined above for the EU. We do not plan to measure the indicator on the share of data companies revenues on total by sector, since the relative universe (the sectors examined) is likely to be different and not comparable to the EU.

70

Page 71: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 29 – International Monitoring, Indicator 3

Indicator 3 - Description

N. Name Description Type and Time Segmentation

3.1

Total revenues of data companies

Total revenues generated by the companies specialized in the supply of data-related products and services

Billion €, 2013-2014 est.

By geography: Total country

If possible, by company size: over/below 250 employees

Indicator 4 – Data Market Size

These indicators are particularly relevant. We plan to measure indicator 4.1 assuming that we can identify a comparable perimeter of the value of data-related products and services. In alternative we can compare IDC’s estimates of the Big Data market size in the EU and the examined countries. According to our definition, IDC’s Big data market is a subset of the EU data market.

Concerning the value of the data economy the measurement depends on the model so the feasibility will depend on the availability of sufficient information about the indirect and induced impacts in each of the examined countries.

Table 30 – International Monitoring, Indicator 4

Indicator 4 - Description

N. Name Description Type and Time Segmentation

4.1Value of the data market

Estimate of the overall value of the data market

Billion €, 2013-2014 est. By geography: total country

4.2Value of the data economy

Value of the data market plus direct, indirect and induced impacts on the economy

Billion € 2013-2014

By geography: Total country

4.3

Incidence of the data economy on GDP

Ratio between value of the data economy and GDP 2013-2014 % By geography: Total country

Indicator 5 – Data workers skills gap

This indicator is relevant for its policy value. It is likely that we will not be able to estimate the data skills gap in exactly the same way as defined above for the EU, particularly because the education systems (supply side) of the national systems are not comparable. However, it should be possible to compare the demand for data workers and to collect some information on the potential forecast gap in each country.

71

Page 72: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 31 – International Monitoring, Indicator 5

Indicator 5 - Description

N. Name Description Type and Time Segmentation

5.1Data Workers Skills Gap

Gap between demand and supply of data workers

Absolute number, 2013-2014 est.

By Geography: total country

Indicator 6 – Citizen’s Reliance on the Data Market

This indicator is very experimental and at this stage it would be difficult to design a comparable one for the examined countries, so we do not plan to measure it. We will however check if there is useful data about the use of data markets products and services by citizens which may help in the analysis of this indicator in Europe.

72

Page 73: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

2.9.2. Main Data Sources

Quantitative data for all indicators - overview

Country Public sources IDC & other private sources

Comments

ChinaNational Bureau of Statistics in China

IDC, Forrester, McKinsey Global Institute

According to IDC, big data is very high in the Chinese policy and industrial development agenda

JapanStatistics Bureau of Japan

IDC, Yano research instituteJapan’s government Growth strategy sees Big data as a key opportunity

USAU.S. Bureau of Economic Analysis (BEA), Census Bureau, USA.gov.

IDC, IBM, Mc Kinsey Global Institute, all market research firms

Leading world market

BrazilInstituto Brasileiro de Geografia e Estatística (IBGE)

IDC, Frost & SullivanThe Brazilian big data market is in a very early development phase

The main data sources for these international countries fall into two main groups:

The national statistical offices provide employment and enterprise statistics, but usually not data market specific data

IDC produces research and data on all these markets, as well as many other market research institutes and leading consulting companies. IDC research is more focused on data.

The research by IDC is a good starting point for this study because it is produced with the same methodologies. IDC runs a worldwide Big Data technologies and services revenues forecast with segmentations for the main world regions.

Selected relevant IDC published research

IDC Worldwide Big Data technology and services 2013-2016 Forecast

This study examines the Big Data technology and services market for the period from 2010 to 2015. Worldwide market sizing is provided for 2010, and a five-year growth forecast for this market is shown for 2011–2015. The Big Data market is an aggregation of storage, server, networking, software, and services market segments, each with several subsegments.

IDC’s recent published research on China data market:

China Big Data Technology and Services Market 2013– 2017 Forecast and Analysis Best Practices: Cathay United Bank's Marketing Transformation: Getting Analytics-Led

Marketing Right Big Data Enables Clinical Decision Support In Hospital Settings

IDC’s recent published research on Japan Data market:

Japan Big Data Software Market Vendor analysis 2013

IDC’s recent published research on Brazil IT market:

73

Page 74: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Latin America Spending forecast by segment, June 2014

2.9.3. Gap analysis

Availability of data Quality and Reliability of

data

Feasibility of indicators

Country Quantitative data for all indicators - overview

Public sources

IDC & other private sources

China Medium High MediumComplex market, comparability and coverage problems

Japan Medium Medium MediumComplex market, comparability and coverage problems

USA High High HighPlenty of data, comparability problems

Brazil Medium Low LowScarcity of data, early stage of development

The public sources (mainly the statistics offices) are available but in national language and require local analysts to be searched. Also, the level of detail and depth required to apply the methodology used in Europe to identify data workers and data companies is fully transparent and downloadable only in the US.

The quality and reliability of data is high in the US, even if there will be comparability problems because of different definitions, but is less satisfactory in China and Japan because of the high complexity of the national markets and the language problems, as well as some difficulty to collect all data. In Brazil there is scarce data because the market is in a very early phase of development. We have found surprising many sources in China, even though there may be a “hype” phenomenon (lots of talk about the data market, less advanced reality). We will need to carry out further analysis of data availability before deciding if China or Japan are the most interesting markets for comparison.

Overall the feasibility of indicators is high only in the US, but it should be possible to develop broadly comparable data about the overall data market in all the targeted countries.

2.9.4. Measurement approach and output

The measurement approach for the 3 international countries to be monitored will include:

Desk research on the main public and private sources for each of the 4 countries the choice between China and Japan will be made at the end of this phase.

Assessment of feasibility of the main selected indicators based on their level of comparability with the EU;

Adaptation of the estimate models developed for the European indicators and of the main assumptions;

74

Page 75: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Measurement of indicators; Development of a qualitative profile for each country focusing on the interpretation of the main

indicators Comparative analysis with the EU

This will be carried out within WP2 as described in the Inception report for the international monitoring.

The main output will be:

A quali-quantitative profile of each of the examined countries based on a structured template covering the monitoring areas of the 5 indicators as indicated above. We cannot guarantee that it will be possible to measure the indicators in the exact same way, however.

A final summary with the conclusions about the comparative analysis with the EU for all the monitored areas.

75

Page 76: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

3.Design of the EDM Monitoring Tool

3.1. Overview

The following figure 3 presents the main components of the European data market monitoring tool. The EDM monitoring tool will be implemented in the first year, and if necessary, the design will be revised and updated in year 2 and year 3 in the Interim reports.

The monitoring tool has a modular structure, sufficiently flexible to adjust to the market evolution. The main components are the following:

Data market taxonomy, presented in chapter 1 and annexed to this report; Data value chain, presented in chapter 1; Assessment of Framework Conditions, discussed in this chapter par.33 Design of indicators. The study team has developed 6 main indicators, which have been

presented in chapter 2 based on a standardized template, specifying their definition, statistical reference, segmentation, gap analysis of data sources, preliminary feasibility assessment, data collection method and measurement approach. This description has also included the segmentation feasibility (by country, sector and company size).

Indicators scope, segmentation and measurement. This module concerns the actual measurement of the indicators, implementing the design described above. The final scope and segmentation of the indicators will ultimately depend on the full availability and quality of the data collected. The results of this module will feed into the country reports templates and the EU overall report template.

Data collection methodologies. In chapter 2 we have finalized the identification of the data collection methodologies for each indicator. We have assessed for each indicator the following:o The availability, quality and reliability of existing public sources such as Eurostat;o The availability, quality and reliability of IDC data;o The possible combination of the two;o The need for ad hoc field research, which has been described above.

Quality control and validation process. The quality control will be performed on all the steps of the monitoring tool design, development and implementation, by the external experts and by the EC. This will be based on full transparency of the sources and development process of the monitoring tool. After a first validation by the experts and the EC, the monitoring tool will be shared with the stakeholder community for further feedback and validation. The process of quality control of the tool and of the indicators is described in the following par.3.4

Production of datasets: year 1, 2, 3. this will entail the development of the templates for the datasets to be filled in and then provided in machine-enabled format, and will be executed by WP2.

Assessment of progress towards key policy targets. Based on the list of selected indicators corresponding to the main quantitative policy targets indicated by the EC, the study team will indicate in the Monitoring tool how the assessment of progress will be made, including for example the evaluation of the speed of increase of data workers and data-based companies (if any). This approach is described in the following par.3.5. The results of this progress assessment will be presented in the interim reports and in the final report. This will be executed by WP2.

76

Page 77: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Figure 5 The EDM Monitoring Tool

Data Market Taxonomy Type of data, Type of stakeholders, Type of skills, Type of

Technologies, Tools, Applications, Services

Indicator 1Number of

Data workers Description

Measurement approach

Data sources

Indicator 2Number of

Data companies

DescriptionMeasurement

approachData sources

Indicator 3Revenues

of Datacompanies

DescriptionMeasurement

approachData sources

Indicator 4Data Market

sizeDescription

Measurement approach

Data sources

Indicator 6Citizen’s

Data EconomyDescription

Measurement approach

Data sources

Indicator 5Data

workers skills gapDescription

Measurement approach

Data sources

European Data Market Monitoring Tool

Design of the Data Value ChainAssessment of Framework Conditions

Indicators scope, segmentation and measurement – GAP Analysis--

Data Collection Methodologies

Quality Control and Validation

Production of datasets’ templates- Year 1 – Year 2 – Year 3

Assessment of progress towards key policy targets

3.2. Implementation approach

The implementation of the EDM monitoring tool will include the following steps, already anticipated in the D1-Inception report:

Finalization of the EDM design and methodology Implementation of the EDM in Europe and the rest of the world (according to the timing

anticipated in the D1 workplan)o Development of research instruments (for the desk and field research)o Data collection and measurement of Indicators o Organization and implementation of field researcho Production of country profiles/data sets (2015/2016)

Assessment of the Framework Conditions Market Forecasting and Impact Model Assessment of progress towards key policy targets Quality Control & Validation of indicators Production of the main deliverables presenting the results of the monitoring activities that is

the First Interim Report (D6), the Second Interim Report (D8), the Final Study Report (D9).

The implementation of the EDM will follow the approach identified in the D1 Inception report. In the following paragraphs we describe the steps of the methodology on which further clarification was needed compared to the workplan outlined in the D1.

77

Page 78: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

3.2.1. Finalization of the EDM design and methodology

This report presenting the EDM modules and the methodology for all indicators will be:

Sent to the EC for feedback and approval Sent to the peer reviewers Francesco Daveri and Jonathan Cave for a structured feedback,

based on a validation template.

The main aspects on which we will request feedback and validation will be:

The overall methodological approach; The taxonomy with the main definitions; The definition of each indicator and its scope; The assessment of the feasibility of the indicators; The methodology for data collection and field research; The measurement methodology of the indicators; The main reference sources (if any relevant sources are missing or of the sources are not

used in the appropriate way); The main risks and potential weaknesses of the methodology selected, and suggestions on

how to manage them.

Based on the feedback and inputs from the EC and the peer reviewers, IDC will revise and finalize the design of the main components of the monitoring tool and update this Methodology report and then start the implementation.

Once approved, the methodology and the taxonomy will be uploaded on the project website and shared with the stakeholder community for further feedback.

It should be noticed that the implementation phase may not satisfy some our expectations and/or require further adjustments of the methodology, as it is impossible to anticipate all possible factors influencing the quality and relevance of the data to be collected. This will be managed by the study team during the measurement process.

At the end of the first cycle of measurement and for the final report IDC will revise and (if needed) update the design of the tool based on the experience of the implementation and the evolution of the market.

3.3. Assessment of Framework Conditions

As anticipated above, the framework conditions identify the main factors which will enable or prevent the development of the European data market and economy. As indicated in our Data Value Chain design (Figure 2), we have divided the framework conditions into two main groups:

Policy-regulatory Market development-non regulatory

The framework conditions will be identified and classified on the basis of desk research, to be carried out in WP2 in parallel with the field research activities. This will include:

Review main existing studies on the data market development conditions Identify the main drivers and barriers to market development, with specific attention to

potential bottlenecks due to policy and regulation (either because of legal constraints, of because of lack of action by regulation)

Assess their relative relevance and potential impact on the market development based on clear and transparent criteria

78

Page 79: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Identify the potential countermeasures which may allow to reduce barriers, accelerate drivers, avoid risks of market underdevelopment

The preliminary identification of the main framework conditions to be assessed is reported below.

3.3.1. Policy/ regulatory Conditions

Data Privacy, Data Protection, Data ownership.

The EU is well known for its stringent data protection and data privacy regulation. There is increasing concern that the new Data Protection directive may create substantial constraints to the development of the Big Data market. Industry actors often voice these concerns in IDC surveys. On the other hand, European consumers are generally satisfied with the regulation. As a recent OECD paper on policy challenges correlated with Big data noticed:

“When the Privacy Guidelines were adopted, data flows involved a limited number of data sources, which were connected through closed networks. This environment allowed policy makers to make a single actor (the “data controller”) responsible for every aspect of processing (collection, use, security, data quality, etc.). The transition from a closed network environment to an open network environment has made it increasingly difficult to maintain this approach. Instead of discrete, well-defined transfers of information, many data-driven goods and services typically involve a multiplicity of information flows, with many different actors, each of which exercises varying degrees of control.”

Copyright regulation

The rise of the data economy also opens serious issues in terms of intellectual property rights and the way they should be enforced and respected. Regulatory intervention is needed in this area to ensure that the exercise of copyright is made in ways that facilitate re-use of data and allow for wider access of data. Without such an approach, the great deal of the potential related to the data economy could be hindered and remain unexploited.

Cyber security

Cyber security threats also pose a serious regulatory challenge as the volume and value of data produced and exchanged in the data economy increases. In fact, according to several sources, he theft of electronic data has dangerously neared, if not surpassed, losses of physical property, thus demonstrating the importance of corporate value of intangible assets and the consequent need to offer adequate legal protection to intangible, as well as to tangible assets. Data companies and indeed an increasing number of other organizations are forced to adapt their security policy to the more open and dynamic environment in which data are widely exchanged and used today. As a result, a coherent and effective policy framework has to be developed at European level in the area of cyber security.

3.3.2. Market development – non regulatory conditions

The development of the Digital Single Market, including the potential relevance and role of the European Digital Service Infrastructure;

The R&D challenges (including for example interoperability and multilingual technologies, in order to reduce barriers for EU start-ups, and improve global competitiveness);

The access to capital: availability of venture capital and business angels for new enterprises and specific issues for this specific market (without an in-depth analysis of venture capital issues in Europe, which is not the focus of this tender);

Availability of data talents and skills. This can be a major problem, since in the ICT industry there is broad concern that the future supply of data scientists will be insufficient to meet emerging demand.

79

Page 80: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

The availability of innovation services, such as existing accelerators, incubators, and initiatives to develop start-ups in the data market by governments or private enterprises (such as the SAP innovation program).

3.4. Design of the quality control process

In this Methodology report we have already carried out the gap analysis, which was a preliminary quality control of the indicators, implemented through a check of the availability of data and the feasibility of indicators based on the desk research on main existing sources. However, we will carry out a complete quality control of the indicators once they will be measured.

The EDM Monitoring tool includes a process of quality control and validation of the indicators. The study team will pay specific attention to the quality of indicators which represent the key success factor of the monitoring tool. The quality control is based on the methodology of feasibility and quality assessment developed by IDC on behalf of EC DG Markt for the assessment of e-Procurement benchmarking indicators5, whose main goal was to develop a benchmarking capacity for periodical monitoring.

The methodology has been adapted to the specific context of the data market. It is based on a comprehensive evaluation of the feasibility and quality of the indicators leading to the revision and fine-tuning of the indicators. The high quality criteria targeted by the study team include feasibility, reliability and clarity, comparability, flexibility and representativeness of the balance of experiences across the EU landscape.

The main criteria of the assessment to be used are the following:

Availability: this refers to the level of availability of the data needed for the specific indicator, in the format required by the indicator. Obviously, without availability no indicator can be measured, so this is a necessary but not sufficient condition of feasibility. This assessment will be repeated after the measurement of the indicators based on the availability of data for all the segmentations foreseen (by country, by industry, by company size).

This assessment will be based on the evidence of the collected data sources.

Reliability: it refers to the quality of the indicator and verifies whether the definition of the indicator is clear, coherent, consistent, shared and understandable by the organizations collecting the data; it means also that the indicator is measured through a clear, coherent and objective measurement scale; and that the necessary data is of good quality, unbiased and complete. High reliability means that the indicator responds to all these characteristics and therefore can be consistently measured over time and can be aggregated across countries and at the EU level.

This assessment will be based on the evaluation of the study team members, through the use of evidence-based criteria. Each evaluation will be clearly motivated by the evaluator and verified by the peer reviewers.

Value added: this refers to the capability of the indicator to measure progress towards the policy objective in a clear and objective way and/or to have a high explanatory power of the measured phenomenon; it implies the presence of objective benchmarks to measure progress. This is a very important attribute of the indicator, because the value added guarantees the ultimate achievement of the main goals of the monitoring tool.

This assessment will be based on the criteria described below on the potential contribution of each indicator to the measurement of progress towards the main policy objectives.

5 Study on e-Procurement Measurement and Benchmarking MARKT 2011/097/C - Lot 1 – Public Procurement Performance Indicators

80

Page 81: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Scalability, comparability and representativeness: We need to verify the capability of basic indicators to be scaled up, that means aggregated and extrapolated from the country level to the EU level, maintaining full comparability at all levels.

The scalability and comparability of the indicators will be assessed through the elaboration of the results of the measurement and their aggregation at EU level, MS level, industry level. This will be clearly documented through evidence-based criteria.

The representativeness of the main indicators will be correlated with their coverage of the geographies (e.g. the 28 MS), of the targeted industries (the 11 sectors), of the company size segmentation (SMEs vs MLEs).

3.4.1.Metrics and Output

The output of the assessment will be measured through a semantic scale in three levels coded with a simple colour key:

High feasibility = Green colour;

Medium feasibility = Amber;

Low feasibility = Red.

Given the number of parameters to be assessed and the number of indicators, it is important to use a measurement scale simple and easy to communicate.

The study team will present the results of the quality assessment of indicators together with the measurement results in the Interim report.

3.5. Assessment of progress on policy targets

The ultimate goal of this study is to define, assess and measure the European data economy, supporting the achievement of the Data Value Chain policy, which aims at developing a vibrant and innovative data ecosystem of stakeholders driving the growth of this innovative market in Europe.

As reported in the EC paper “Elements for a data value chain policy”, the concept of the data value chain refers to the life-cycle of data, starting with the generation of data, going from their validation and further processing leading to use and re-use in the form of new innovative products and services.

The following guiding principles underlie our strategic initiative on the data value chain:

1. a wide availability of good quality data, including the free availability of publicly-funded data;2. free flow of data across the European Union, as part of the Digital Single Market;3. finding the right balance between individuals' potential privacy concerns and the exploitation of

the potential of the reuse of their data while also empowering citizens to use their data in any way they wish to.

The assessment of progress towards these policy targets requires the development of a baseline scenario of the current state of the art of the data value chain in Europe, as well as the production of facts and figures about the main indicators and their trends of evolutions. This will be the main result of the European Data Market monitoring tool presented in the previous chapter and the measurement of the indicators designed.

The following table shows how the indicators correspond to the key policy targets identified by the EU strategy. More specifically:

The measurement of the data market main components through the indicators of the monitoring tool will provide for the first time a clear picture of the state of development of the

81

Page 82: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

emerging data industry and the level of use of data in Europe, with a baseline to measure progress from;

The measurement of indicators for the years 2013. 2014, 2015, 2020 will allow to estimate growth rates, therefore assessing the dynamics of the market and estimating actual and potential progress;

The assessment of the main framework conditions corresponding to the basic principles outlined above, as well as the analysis of existing barriers will complement the measurement of indicators allowing to draw conclusions about progress towards policy targets;

Finally, the indicators will actually measure the achievement of relevant policy targets as shown in the table below.

Table 32 EDM Indicators and key policy targets

Key Policy Targets EDM Indicator

Increase the number of data-related jobs (at least 250.000 new data related jobs in Europe in 2017)

1.1 Number of data workers in Europe23455.1 Gap between demand and supply of data

workers

Increase the number of data-related start-ups and fast-growing SMEs;

2.1 Number of data companies – actual and forecast

Results of the survey on start-ups and spin-offs

Increase the revenue generated based on data in the Member States;

3.1 Revenues of data companies – actual and forecast

4.4.1 Value of the data market – actual and

forecast

Improved use of data for decision-making processes in the private sector and the public sector Story deliverables

Increase citizens' use of data for informed behavioural decisions 6.1 Citizen’s data market

82

Page 83: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

4.Next Steps

4.1. Next steps

Once submitted to the European Commission, the present Methodology Report will be discussed in detail in the course of the First Interim Meeting to be held at the Commission’s premises in Luxembourg on the 2nd July 2014.

The study team, led by Cattaneo and supported by Lifonti, will subsequently present the Methodology Report to Daveri and Cave for peer review. They will have 1 week to provide a structured feedback based on a quality control template (see Inception Report, D1, par. 3.7.2) checking in particular the overall coherence, the scientific quality and the conformance to the study’s objectives of the Methodology Report.

The input of the peer reviewers, together with the European Commission’s feedback, will be integrated by the study team and a final version of the Methodology Report will be delivered to the Commission in the course of the month of July 2014.

The final Methodology Report incorporating the peer review process and the European Commission’s feedback will constitute the bases to proceed to the execution of Work Package 2, according to the Work Packages’ description outlined by the Study Team in the Inception Report.

Work Package 2 will start with the Finalization of the Monitoring Tool (Task 2.1) as designed in the present Methodology Report and will consist of a careful review of the Monitoring Tool’s modules in the light of the quality control process, peer review process and feedback provided by the European Commission at the completion of Work Package 1.

The actual implementation of the Monitoring Tool (Task 2.2) will then take place from July 2014 to February 2015. Led by Cattaneo with the support of Bonagura and Aguzzi, task 2.2 will develop the necessary research instruments to collect the required data and measure the identified indicators and will be accompanied, in parallel, by the organization and implementation of primary research to collect additional data and evidence through surveys and qualitative interviews. The study team will then proceed to the production of the First Interim Report (D6) which is expected to be finalized at the end of March 2015.

83

Page 84: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Annex 1 Taxonomy, Release II

Introduction

This document is the Annex 1 to the Methodology Report D2 of the study SMART 2013/0063 European Data Market entrusted to the European Commission DG Connect to IDC and Open Evidence. This document presents the release 2 of the taxonomy, updated on the basis of desk research and the data companies mapping and classification presented in D.4.1 the “European Data Landscape” (visible at http://datalandscape.eu/). This includes the definitions used for:

Data and type of data; Data market, data economy, data workers, data scientists, data companies; Data skills; Data-based products and services; Main Stakeholders; Main Framework Conditions.

These definitions are presented through a structured template which has been made available for browsing and integration in the stakeholder community.

The data market taxonomy is a live document which will be completed and updated as the study proceeds, also on the basis of feedback from the stakeholders and the EC.

Design of the Data Value Chain

The following figure presents the preliminary design of the Data Value Chain reflecting the data ecosystem. This is our starting point for the analysis reflecting the conceptual framework with which we approach the measurement of the data market and the data economy.

84

Page 85: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

A view of the European Data Ecosystem

MICROECONOMIC IMPACTSCosts savingsIncreased flexibility thanks to timely and improved decision makingNew products/servicesImproved customer servicesIncreased revenue

MACROECONOMIC IMPACTSGDP growthSMEs and jobs creationData-driven competitiveness of EU industry

Data collection and creation

Storage,aggregation, organization

Analysis, processing,

marketing and distribution

DATA VALUE CHAIN

Framework Conditions of development of the European Data Economy

Policy/ Regulatory Framework Conditions Non Regulatory Framework Conditions

Dat

a Pr

ivac

y

Dat

a O

wne

rshi

p

Cop

yrig

ht

Secu

rity

Skill

s

infra

stru

ctur

es

Inte

rope

rabi

lty,

Stan

dard

s

Acce

ss to

risk

ca

pita

l

Stakeholder Categories

ICT Enablers and Cross Infrastructures

Data HoldersNew Intermediaries

Final Users (internal/external use)

Vertically Integrated Suppliers

Primary use

Re-use

Source: IDC 2014

The figure is composed of the following elements, which describe the structure of the data economy:

The data value chain shows the 4 main phases of manipulation of data which lead to its exploitation;

The macroeconomic and microeconomic impacts identify the direct and indirect impacts of the data value chain on the economic system and user enterprises;

The stakeholder categories identify the main type of actors on the basis of their role in the data value chain;

The framework conditions identify the main factors which will enable or prevent the development of the European data market and economy. They are divided into policy-regulatory framework conditions and non-regulatory conditions.

While we have identified multiple stakeholders with multiple roles, it is clear that the leading global web platforms such as Amazon, Google and Facebook dominate the whole value chain; their vertical integration and their market dominance represent a huge competitive advantage. Framework conditions (such as for example the Digital Single Market) instead are of primary relevance for all the other players, particularly native EU players.

Within the framework of the present study, the main steps of the data value chain to be taken into consideration are as follows:

Collection/access of data from myriad of sources within the applicable legal framework. Collection can be direct (for example through loyalty schemes operated by retailers, transport and hospitality service providers) or indirect (for example by recording the location of someone using a cellular phone). Data can be also created through an analysis rather than being captured;

85

Page 86: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Storage and aggregation by service providers and social networks, but also by companies in traditional sectors such as finance, retail, transport, utilities, government;

Processing Analysis, marketing and distribution, merging data from different sources (public, proprietary or institutional research) and relying on analytics to derive insights and value. Traditional players across vertical markets can perform this task if they have the necessary skills/technology; alternatively they can rely on external data brokers and providers;

Usage, both in the public and private sectors to better serve customers and/or improve efficiency.

Data Market Definitions

The following definitions are mainly based on OECD reports (2011, 2013)6 about data and data economy. For sake of completeness, these definitions will be discussed in the Methodology Report D2.

Definition of Data

Key terms Definition

Data Data is usually defined as qualitative or quantitative statements or information which can be coded and which are assumed to be factual and not the product of analysis or interpretation. For the sake of this study we consider only data which is collected, processed, stored, and transmitted over digital information infrastructures and/or elaborated with digital technologies. This definition includes multimedia objects which are collected, stored, processed, elaborated and delivered for exploitation through digital technologies (for example, images databases).

Information Information is the output of processes that summarise, interpret or otherwise represent the content of a message to convey meaning. Therefore information is not a mere synonymous of data.

Data Economy and Data Market

Key terms Definition

Data Market The data market is the market where digital data is exchanged as "products" or as "services" derived from raw data. The exploitation of the exchanged data enables a better understanding of the environment, and helps improving existing services, increasing efficiency, and eventually launching new products/services also in the more traditional sectors of the economy (such as manufacturing, transport or retail).

Data Economy The data economy involves the generation, collection, storage, processing, distribution, analysis, elaboration, delivery and exploitation of data enabled by digital technologies. The data economy includes also the direct, indirect and induced effects of the data market on the economy.

6 OECD, Exploring the Economics of Personal Data: a Survey of Methodologies for Measuring Monetary Value, OECD Digital Economy Papers, n. 220

OECD, Exploring Data-Driven Innovation as a New Source of Growth: Mapping the Policy Issues raised by Big Data, OECD Digital Economy Papers, n. 222

86

Page 87: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Knowledge Economy

It should be noticed that the data economy is not synonymous with the knowledge economy, a broader concept which can be defined as follows.

We define the knowledge economy as production and services based on knowledge-intensive activities that contribute to an accelerated pace of technical and scientific advance, as well as rapid obsolescence. The key component of a knowledge economy is a greater reliance on intellectual capabilities than on physical inputs or natural resources.

Internet Economy

“The Internet economy is defined as covering the full range of our economic, social and cultural activities supported by the Internet and related information and communications technologies”.7

Data-related Companies

Key term Definition

Data Companies Data companies are data suppliers, meaning that their main activity is the production and delivery of data-related products, services and technologies. These companies constitute the emerging data industry. Data companies may be start-ups, innovative SMEs, spin-offs of larger enterprises. Most of them originate from, or are currently classified within the ICT industry, because the core technology they use is Big Data technology.

Data Users

Key term Definition

Data Users Data users are organisations with high intensity of reliance on data for the accomplishment of their mission: this means that they generate and exploit their own data, collect online customer data intensively, subject this data to sophisticated analyses (such as controlled trials and data and text mining), and use what they learn to improve their business.

Data workers and Data Scientists

Key terms Definition

Information or Knowledge workers

As the data economy is not synonymous with the knowledge economy, for the same reasons data workers do not coincide with knowledge workers.

Information or Knowledge workers in the most basic definition are persons employed to produce or analyse ideas and information. Multiple sources define knowledge workers as workers creating knowledge capital, who process existing information to create new information to be used to define and solve problems. They include, as an example, medical practitioners, lawyers, judges, teachers, architects, engineers, managers or salespeople. Their main capital is knowledge, and they are mainly focused on "non-

7 “Measuring the Internet Economy: A Contribution to the Research Agenda”, OECD Digital Economy Papers, 2013

87

Page 88: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

routine" tasks.

Data workers Data workers collect, storage, manage and analyze data, as their primary activity. Data workers can be knowledge workers if they are focused on non-routine tasks. For example, data entry clerks' primary activity is related to data, so they are data workers. However data entry is a very routine task and as such data entry clerks shouldn't be considered as knowledge workers. Another category of data workers is data analysts, who usually extract and analyse information from one single source, such as a CRM database. They require a medium level of creative thinking and usually work on structured data.

Data scientists Data scientists require solid knowledge in statistical foundations and advanced data analysis methods combined with a thorough understanding of scalable data management, with the associated technical and implementation aspects (European Big Data Value Partnership Strategic Research and Innovation Agenda, April 2014). They can deliver novel algorithms and approaches such as advanced learning algorithms, predictive analytics mechanisms, etc.. Data scientists should also have a deep knowledge of their businesses; the most difficult skills to find, include advanced analytics and predictive analysis skills, complex event processing skills, rule management skills, business intelligence tools, data integration skills (UNC, 2013).

Data-driven Innovation

In this section, we provide an overview of the 5 innovation-related areas driven by the use of data.

Data-based creation of new products (goods and services)

This includes the use data as a product (data products) or as a major component of a product (data-intensive products)

Data-driven processesUse of data to optimize or automate production or delivery processes. This includes the use of data to improve the efficiency of distribution of energy resources (“smart” grids), logistics and transport (“smart” logistics and transport)

Data-driven marketing, data-driven product design

Use of data to improve marketing, for instance by providing targeted advertisements and personalized recommendations or other types of marketing-related discrimination; and the use of data for experimental product design

Data-driven organization, data-driven decision making

Use of data for new organizational and management approaches or for significantly improving existing practices

Data-driven R&DUse of data to enhance research and development. This includes new data-intensive methods for scientific exploration by adding a “new realm driven by mining new insights from vast, diverse data sets”

Source: OECD (2013), “Exploring Data-Driven Innovation as a New Source of Growth: Mapping the Policy Issues Raised by "Big Data"”, OECD Digital Economy Papers, No. 222, OECD

88

Page 89: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Data Products, Services and Tools

The classification of main data products, services and tools is work in progress and will be updated based on the desk and field research of the next phases of the project. Boundaries between categories are still uncertain.

Type of Product/ Service/ Technology Description

Vertical solutions/AppsData based apps /services combining mobile, cloud and social technologies, often dedicated to specific vertical markets (retail, healthcare, utilities, finance...)

Vertical solutions/ Marketing and online advertising

Data-based products and services for online advertising and marketing

Data analytics, Socialytics Solutions and tools based on data analytics and/or elaborating real real-time streaming of social data across a range of social networks

Data brokerage services and platforms

Software and consulting services leveraging Big Data tools and access to the data held by the company itself

Data marketplaces online markets for buying and selling finished SaaS applications and premium datasets; and/or providing access to complex and diverse data sources

Data analytics, data mining tools and technologies (including Big Data)

Technologies and tools for data mining and data analytics, including for example computational linguistic, semantic search, natural language processing, artificial intelligence for data and text mining solutions

Main Stakeholders

The data value chain identifies the following main stakeholder categories.

Data Holders

Data holders are public or private organizations owning or creating data. They can be vertically integrated (cover the whole value chain, from collection to use of the data) or rely on specialised intermediaries/ outsource the other steps of the value chain.

Type of data Type of stakeholder

Public Sector Information Government, Healthcare, Education

Personal data and user-generated content B2C (Web players, Telecom operators, Media, Retail, Finance, Utilities...)

Business data B2B (Web players, Telecom operators, Manufacturing, Utilities, Market research and Information services companies)

Research and Scientific Data University and research, Scientific publishers, Scientific databases)

89

Page 90: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

90

Page 91: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Data companies

Data companies are data suppliers, meaning that their main activity is the production and delivery of data-related products, services and technologies. These companies constitute the emerging data industry.

The following Figure shows the classification of the main categories of data market suppliers who will be analysed in the study, expanding the categories represented in the Data Value Chain Figure. As shown in the Figure below, most of the data companies originate from the ICT sector, but not only. The main groups we have identified are the following:

Integrated Suppliers New/ specialised intermediaries ICT enablers and cross-infrastructure providers

Each of these main groups is presented below, leveraging IDC’s taxonomy to specify the type of tools and technologies falling into each group.

Connectivity Infrastructure

Platform & IT Infrastructure

Tools and Technologies

Data Marketplaces, Data Platforms, Data Brokers

Analytics

Vertical Solutions /Mobile Apps/Cloud Apps/ Big Data Apps

Cloud C

omputing

Business &

IT services

ICT enablers and

infrastructures N

ew

Intermediaries

Integrated suppliers

Source: IDC 2014

Integrated Suppliers

Classification Description

Traditional information services providers

They are the market research companies and business data providers already active in the information services market, which exploit Big Data related tools and technologies to update their offering. They integrate the many phases of the data value chain, from data creation and collection, to storage, analysis, primary and secondary use. For the sake of this study we will take into account only the activities, revenues and employees related with the provision of data-based services.

Vertically Integrated Suppliers

They are large organizations who are data holders, users of their own data, who enter the market to provide data-based services. Their original core business, though, is not data provision (for example, telecom operators, utilities, financial services). Therefore they integrate the main phases of the data value chain, from

91

Page 92: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

data creation and collection, to data use, both primary and secondary.

For the sake of this study we will take into account only the activities, revenues and employees related with the provision of data-based services.

New - Specialized Intermediaries

New- specialised Intermediaries are organizations whose core business is to develop and sell tools, products and/or services based on the re-use of data (including storage, aggregation, analysis) to other organizations. They can be cross-sector or specialised in specific vertical markets.

These enterprises are leveraging the new digital technologies to help other organizations (data holders and users, who sometimes are the same) to exploit and use data in innovative ways. They are innovative companies, some of them created by IT giants (as Microsoft) but many more are start-ups or SMEs.

Classification Offering

Providers of Data Marketplaces, Data Platforms, Data Brokers, Social Data Platforms

Provision of services to store, elaborate and exchange data. They provide a mix of SaaS services and data, premium datasets, access to complex and diverse data sources

Providers of Data AnalyticsAnalytics and discovery software, including search engines, data mining, text mining and other text analytics, rich media analysis, and data visualization

Providers of vertical solutions / mobile apps/ cloud apps / big data apps

Applications software including business process or industry-specific applications such as for Web clickstream analysis, fraud detection, and logistics optimization

ICT Enablers and Infrastructure providers

ICT enablers are organizations providing tools and services enabling the management, storage, processing, analysis and distribution of data. Based on the IDC classification inspired to the Big Data market, we identify the following segments.

92

Page 93: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Table 33 ICT Enablers and Infrastructure providers

Classification Data-related offering

Global IT and OTT actors (Google, Facebook, Microsoft, IBM..)

Integrated provision of IT services and web-based services. Concerning the data value chain, they can be defined as integrated suppliers since they cover all phases of the value chain.

For the sake of this study we will take into account only the activities, revenues and employees related with the provision of data-based services.

ICT

Enab

lers

Providers of Software and Tools

Information management software, including parallel and distributed file systems with global namespace, highly scalable (size and structure) relational databases, key-value pair (KVP) data stores, content management systems, graph databases, XML databases, object-oriented databases, dynamic application data stores and caches, data integration, and event-driven middleware

Providers of business & IT services

Business consulting, business process outsourcing, IT project-based services, network consulting and integration services, IT outsourcing, storage services, security services, software and hardware support, and training services related to Big Data implementations

Cro

ss In

fras

truc

ture

Cloud Computing Providers

Cloud infrastructure services that combine server, storage, and networking services, which are delivered through public cloud offerings

Providers of platforms & IT Infrastructure

Datacenter networking infrastructure used in support of Big Data server and storage infrastructure (Specifically, this forecast models spending based on IDC's research into the following markets: Ethernet switches, Fibre Channel switches, InfiniBand switches, and application delivery. Datacenters owned by enterprises and cloud service providers are counted.)

External storage systems purchases by enterprises and cloud service providers and direct purchases of HDDs by select large cloud service providers (It also includes supporting storage software for device, data replication, and data protection of Big Data storage assets.

Server revenue (including internal storage, memory, network cards) and supporting system

software as well as spending for self-built servers by large cloud service providers

Connectivity Infrastructure providers

Including providers of fixed and mobile networks and the associated traditional services (i.e., voice messaging, text messaging) as well as data services.

For the sake of this study we will take into account only the activities, revenues and employees related with the provision of data-based services.

93

Page 94: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Final Users

The last link of the value chain is the final users: organizations or individuals using data for their own internal use or to produce data-based products and services for their own customers. Basically, they use data-based services to improve their efficiency, effectiveness and/or their competitiveness.

Classification Definition

Public sector (government, healthcare, education)

Public sector organizations

Businesses Business organizations

Citizens Citizens

Innovative SMEs and start-upsOrganizations whose core business is to provide data-based products and services to end-users in specific vertical markets (they are different from the specialised intermediaries identified above

Enabling Players

For the sake of completeness, with the following classification we identify all the other stakeholders and enabling actors of the data market ecosystem.

Classification Examples

Education & Training Higher Education Institutions

Access to Risk CapitalVenture Capitalists, Business Angels, Government funds, Development Agencies, Incubators, Accelerators

Policy Makers, Policy Regulators Government, Public Bodies, European Institutions

94

Page 95: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Annex 2 Data workers – Selected ISCO codesTable 34 ISCO-08 codes where data workers may be included

Digit 1 Digit 2 Digit 3 Digit 4

1. Managers 11. Chief executives, senior

officials and legislators

112. Managing directors

and chief executives

1120. Managing directors

and chief executives

12. Administrative and

commercial managers

121. Business services

and administration

managers

1211. Finance managers

1213. Policy and planning

managers

1219. Business services

and administration

managers not elsewhere

classified

122. Sales, marketing

and development

managers

1221. Sales and marketing

managers

1222. Advertising and

public relations managers

1223. Research and

development managers

13. Production and specialised

services managers

131. Production

managers in agriculture,

forestry and fisheries

1311. Agricultural and

forestry production

managers

1312. Aquaculture and

fisheries production

managers

132. Manufacturing,

mining, construction,

and distribution

managers

1321. Manufacturing

managers

1322. Mining managers

1323. Construction

managers

1324. Supply, distribution

and related managers

95

Page 96: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Digit 1 Digit 2 Digit 3 Digit 4

133. Information and

communications

technology service

managers

1330. Information and

communications

technology service

managers

134. Professional

services managers

1346. Financial and

insurance services branch

managers

1349. Professional

services managers not

elsewhere classified

14. Hospitality, retail and other

services managers

142. Retail and

wholesale trade

managers

1420. Retail and

wholesale trade managers

143. Other services

managers

1431. Sports, recreation

and cultural centre

managers

1439. Services managers

not elsewhere classified

2. Professionals 21. Science and engineering

professionals

211. Physical and earth

science professionals

2111. Physicists and

astronomers1.

2112. Meteorologists

2113. Chemists

2114. Geologists and

geophysicists

212. Mathematicians,

actuaries and

statisticians

2120. Mathematicians,

actuaries and statisticians

213. Life science

professionals

2131. Biologists, botanists,

zoologists and related

professionals

2141. Industrial and

production engineers

24. Business and administration

professionals

241. Finance

professionals

2411. Accountants

2412. Financial and

investment advisers

96

Page 97: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Digit 1 Digit 2 Digit 3 Digit 4

2413. Financial analysts

242. Administration

professionals

2421. Management and

organization analysts

2422. Policy

administration

professionals

2423. Personnel and

careers professionals

243. Sales, marketing

and public relations

professionals

2431. Advertising and

marketing professionals

2433. Technical and

medical sales

professionals (excluding

ICT)

2434. Information and

communications

technology sales

professionals

25. Information and

communications technology

professionals

251. Software and

applications developers

and analysts

2511. Systems analysts

2512. Software developers

2514. Applications

programmers

2519. Software and

applications developers

and analysts not

elsewhere classified

252. Database and

network professionals

2521. Database designers

and administrators

2522. Systems

administrators

2529. Database and

network professionals not

elsewhere classified

97

Page 98: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Digit 1 Digit 2 Digit 3 Digit 4

263. Social and religious

professionals

2631. Economists

2632. Sociologists,

anthropologists and

related professionals

2633. Philosophers,

historians and political

scientists

3. Technicians and

associate

professionals

33. Business and administration

associate professionals

331. Financial and

mathematical associate

professionals

3311. Securities and

finance dealers and

brokers

3312. Credit and loans

officers

3313. Accounting

associate professionals

3314. Statistical,

mathematical and related

associate professionals

3315. Valuers and loss

assessors

4. Clerical support

workers

43. Numerical and material

recording clerks

431. Numerical clerks 4311. Accounting and

bookkeeping clerks

4312. Statistical, finance

and insurance clerks

4 major groups 9 sub-major groups 21 minor groups 52 unit groups

98

Page 99: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Annex 3 Data companies – Selected codes from NACE rev2

INDUSTRIES Supply of data products and servicesDemand of data products and services

SECTION A — AGRICULTURE, FORESTRYAND FISHING Excluded

May be included in the case of large producers

SECTION B — MINING AND QUARRYING Excluded

Included, especially for division 06 (crude petroleum, gas)

SECTION C — MANUFACTURING Excluded Included

SECTION D — ELECTRICIT Y, GAS, STEAM AND AIR CONDITIONING SUPPLY

May be included in case of business units addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

SECTION E — WATER SUPPLY;SEWERAGE, WASTE MANAGEMENT AND REMEDIATION

ACTIVITIES

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

SECTION F — CONSTRUCTION

May be included in case of business units/start ups addressed to collect and aggregate data products and services, especially for utility projects. Nevertheless, for the moment and for the sake of the study we exclude it

Included

SECTION G — WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND

MOTORCYCLES

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

SECTION H — TRANSPORTATION AND STORAGE

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

SECTION I — ACCOMMODATION AND FOOD SERVICE ACTIVITIES Excluded May be included, but

not relevantSECTION J - INFORMATION AND

COMMUNICATION Included Included

Division Group Class58 Publishing activities Included Included

58.1Publishing of books, periodicals and other publishing activities

Excluded Included

58.11 Book publishing Excluded Included

58.12

Publishing of directories and mailing lists Included Included

58.13 Publishing of newspapers Excluded Included

58.14

Publishing of journals and periodicals Excluded Included

58.19 Other publishing activities Excluded Included

58.2 Software publishing Excluded Included58.2

1Publishing of computer games Excluded Included

58.2 Other software publishing Excluded Included

99

Page 100: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

9

59

Motion picture, video and television programme production, sound recording and music publishing activities

Excluded Included

59.1Motion picture, video and television programme activities

Excluded Included

59.11

Motion picture, video and television programme production activities

Excluded Included

59.12

Motion picture, video and television programme post-production activities

Excluded Included

59.13

Motion picture, video and television programme distribution activities

Excluded Included

59.14

Motion picture projection activities Excluded Included

59.2 Sound recording and music publishing activities Excluded Included

59.20

Sound recording and music publishing activities Excluded Included

60 Programming and broadcasting activities

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

60.1 Radio broadcasting

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

60.10 Radio broadcasting

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

60.2Television programming and broadcasting activities

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

60.20

Television programming and broadcasting activities

May be included in case of business units/start ups addressed to collect and aggregate data products and services. Nevertheless, for the moment and for the sake of the study we exclude it

Included

61 Telecommunications May be included in case of business units/start ups Included

61.1

Wired telecommunications activities

May be included in case of business units/start ups Included

61.10

Wired telecommunications activities

May be included in case of business units/start ups Included

61.2

Wireless telecommunications activities

May be included in case of business units/start ups Included

61.20

Wireless telecommunications activities

May be included in case of business units/start ups Included

61.3

Satellite telecommunications activities

May be included in case of business units/start ups Included

61.30

Satellite telecommunications activities

May be included in case of business units/start ups Included

100

Page 101: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

61.9

Other telecommunications activities

May be included in case of business units/start ups Included

61.90

Other telecommunications activities

May be included in case of business units/start ups for the collection and delivery of data products and services Included

62

Computer programming, consultancy and related activities Included Included

62.0

Computer programming, consultancy and related activities Included Included

62.01

Computer programming activities Included Included

62.02

Computer consultancy activities Included Included

62.03

Computer facilities management activities Included Included

62.09

Other information technology and computer service activities Included Included

63Information service activities Included Included

63.1

Data processing, hosting and related activities; web portals Included Included

63.11

Data processing, hosting and related activities Included Included

63.9Other information service activities Included Included

63.91 News agency activities Excluded Included 63.99

Other information service activities n.e.c. Included Included

SECTION K — FINANCIAL AND INSURANCE ACTIVITIES

May be included in case of business units/start ups for the collection and delivery of data products and services Included

SECTION L — REAL ESTATE ACTIVITIES Excluded Included

SECTION M — PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES

Division Group Class

69Legal and accounting activities Excluded Included

69.1 Legal activities Excluded Included69.10 Legal activities Excluded Included

69.2

Accounting, bookkeeping and auditing activities; tax consultancy Excluded Included

69.20

Accounting, bookkeeping and auditing activities; tax consultancy Excluded Included

70

Activities of head offices; management consultancy activities Included Included

101

Page 102: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

70.1 Activities of head offices Excluded Included 70.10 Activities of head offices Excluded Included

70.2Management consultancy activities Included Included

70.21

Public relations and communication activities Excluded Included

70.22

Business and other management consultancy activities Included Included

71

Architectural and engineering activities; technical testing and analysis

71.1

Architectural and engineering activities and related technical consultancy Excluded Included

71.11 Architectural activities Excluded Included 71.12

Engineering activities and related technical consultancy Excluded Included

72Scientific research and development Inluded Excluded

72.2

Research and experimental development on social sciences and humanities Included Included

72.20

Research and experimental development on social sciences and humanities Included Included

73Advertising and market research Included Included

73.1 Advertising Included Included 73.11 Advertising agencies Excluded Included 73.12 Media representation Excluded Excluded

73.2Market research and public opinion polling Included Included

73.20

Market research and public opinion polling Included Included

74Other professional, scientific and technical activities Inlcuded Excluded

74.1Specialised design activities Excluded Excluded

74.10

Specialised design activities Excluded Excluded

74.2 Photographic activities Excluded Excluded74.20 Photographic activities Excluded Excluded

74.3Translation and interpretation activities Excluded Excluded

74.30

Translation and interpretation activities Excluded Excluded

74.9

Other professional, scientific and technical activities n.e.c. Included Excluded

102

Page 103: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

74.90

Other professional, scientific and technical activities n.e.c. Included Excluded

SECTION N — ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES

77Rental and leasing activities Excluded Included

78 Employment activities Excluded Excluded

79

Travel agency, tour operator and other reservation service and related activities

May be included in case of business units/start ups for the collection and delivery of data products and services Included

80Security and investigation activities Excluded Included

81Services to buildings and landscape activities Excluded Excluded

82

Office administrative, office support and other business support activities Included Included

82.1Office administrative and support activities Excluded Excluded

82.11

Combined office administrative service activities Excluded Excluded

82.19

Photocopying, document preparation and other specialised office support activities Excluded Excluded

82.2 Activities of call centres Excluded Included82.20 Activities of call centres Excluded Included

82.3

Organisation of conventions and trade shows Excluded Included

82.30

Organisation of conventions and trade shows Excluded Included

82.9Business support service activities n.e.c. Excluded Included

82.91

Activities of collection agencies and credit bureaus Excluded Included

82.92 Packaging activities Excluded Excluded82.99

Other business support service activities n.e.c. Excluded Excluded

SECTION O — PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY Excluded Excluded

SECTION P — EDUCATION Excluded ExcludedSECTION Q — HUMAN HEALTH AND SOCIAL

WORK ACTIVITIES Excluded ExcludedSECTION R — ARTS, ENTERTAINMENT AND

RECREATION Excluded Excluded

SECTION S — OTHER SERVICE ACTIVITIES Excluded Excluded

103

Page 104: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Annex 4 Peer Review Name of the peer-reviewer: Francesco Daveri

Date of peer review: July 16, 2014

General opinion

Is the report complete, according to the expectations shown in the Inception report? Does it respond to the objectives set forth in the work plan?

The Methodology report is fully consistent with the expectations and the objectives stated in the Inception Report, which in turn were designed so as to match and detail in sub-objectives those in the original EC tender.

A methodology (framework and taxonomy) is developed to monitor the European data market (EDM). This methodology includes a list of six quantitative indicators, the identification of the data sources and the analysis of potential data gaps detailed by indicator.

The role of the descriptive “parable” stories is also mentioned in the first pages of the report, as well as the relevant steps to develop the stakeholder community in order to contribute to the development of the EDM monitoring tool and the descriptive stories. These issues will be developed further at a later stage, I believe.

Is the report clear and coherent in its statements, assessments and arguments?

The statements and the main arguments in the Methodology report are clearly and coherently expressed.

The definitions reported at page 8-9 usefully recall and adapt to the Data Economy the OECD work on the definitions of the Knowledge Economy.

The assessment criteria of availability, reliability, value added and scalability-comparability-representativeness are a sensible set of criteria.

Figure 2 provides the “data value chain and the ecosystem”, what in the economist’ jargon, may be called the crucial demand and supply framework of the EDM (and its intermediate inputs). This is at the heart of the report.

Are the language and the format of the report of good quality?

The report is professionally crafted both in terms of language and format.

(The outline of the Inception Report contained a slight misnomer. To be more informative, section 2.4 should be titled “Monitoring the European Data Market: Finalization and implementation”)

104

Page 105: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Review of Chapters

Chapter 1 Introduction + Annex 1 Taxonomy

What is your opinion on this chapter? Is the methodology appropriate? Is the chapter coherent, relevant, credible and well presented?

What is your opinion on the key definitions and the taxonomy presented in Annex I? Are they complete, clear, coherent, credible and useful for the objectives of the study?

This chapter is well conceived and executed.

The detailed definitions in Annex 1 are very careful and - as far as I can judge - complete for they come from a recollection of existing OECD work, rephrased and adapted within the Data Economy framework.

The section on Data Products, Services and Tools is appropriately left “in progress”.

In the section listing and providing details of the Main Stakeholders (p.76-77), I am not sure I understand the reason for the distinction between the ICT enablers (defined as “ICT stakeholders” (a broader category)) and Enabling players. This is at least not very well explained.

Are you aware of any further data or literature that you suggest to include?

No

Recommendations for revision - further development of the Chapter

I suggest to clarify the distinction between ICT enablers and Enabling players here (I understand that it is exploited later on) or exclude it whatsoever.

Chapter 2 Design of Indicators

All paragraphs nicely have the same structure (definition, description, data sources, data gaps, measurement assumption).

Section 2.1 contains a rather farfetched description of the rationale for a micro-founded approach. The sentence to (at least) reformulate is the following: “In order to adopt a micro-foundation approach, we need to examine the impacts of the adoption of data-relate products and services on the economic agents. To do so, the micro-economic analysis requires an analysis at supply-side level and at demand-side level. The effects on the economic system are by the way much more rapid and measurable on the supply-side than on the demand-side.

A micro-founded approach would study the motivation to adopt or produce data-related products and services first, then its consequences, which may also be the result of market outcomes. This is the way I understand what a microeconomic approach to market analysis boils down to. The sentence highlighted in yellow is also not well explained. The idea – I think – is that the building blocks of the EDM are measured on the supply side for it is more easily done. This will become clearer in the second half of the report. But for now this sentence is to be taken for granted. Clarifying this upfront is important, I think, for the whole chapter rests on the approach adopted here.

105

Page 106: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

INDICATOR 1 Number of data workers

What is your opinion on the methodological approach for Indicator 1? Is it appropriate, coherent, relevant, credible and well presented?

Definition and statistical reference

Very carefully thought-out and executed

Description of the indicator

Very carefully thought-out and executed

Main data sources

As to the data sources, here a rather standard problem arises. The question is what to do when there are missing cells at the desired level of disaggregation. The solution proposed in the report is the following: “ In those cases in which these two datasets prove to be insufficient to form solid assumptions, IDC will leverage and calibrate assumptions based on US labour statistics office data which provides a higher level of granularity of US employees by occupation and vertical market and could therefore be used both formulate and, more importantly, validate assumptions about the EU labor market.” I have a direct experience concerning ICT investment data of how mechanically transferring US data to the EU setting may lead to arbitrary (so potentially wrong and/or misleading conclusions). In the past I have had long and heated discussions with Bart van Ark and hid Groningen group on these issues, which are commonplace when new data must be generated and survey data are unavailable.

Gap analysis

Compared to other indicators, data availability is here high or medium.

Measurement approach/field research/interviews

The estimation procedure is clearly spelled out.

There are many assumptions to be made. Given that they are many, it seems to me impossible to control the relevance of each individual assumption at a time, which would be desirable. The final result is likely to be a black box, namely the result of a number of joint untested assumptions.

Here are the most relevant sentences or assumptions (with comment in RED):

“As a first approximation, the number and share of data workers depends on occupational mix and does not depend on industry mix or country mix”. I do not understand this sentence.

“There is a positive correlation between the number of data workers and the Business Analytics software revenues” OK

“Data workers are positively correlated with the software revenues” OK

“The number of data workers is correlated with industry specific characteristics” OK, but which ones?

”Data workers may be employed in both SMEs and L companies; during the early stage of the adoption curve, the use of data and the diffusion of data workers is higher into the large companies; as the adoption curve progress, SMEs increase their use and production of data products and services, and the number of data workers increases” It seems weak and logically questionable.

The number of data workers does not relate to country-specific aspects, except after controlling for occupation-specific and industry-specific aspects (e.g. related to firm size composition). Again, I have a hard time in understanding

106

Page 107: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Are you aware of any further data or literature that you suggest to include?

No

Recommendations for revision - further development of the Indicator:

The whole undertaking is huge and generally well-conceived.

But I don’t like the list of circumstantial assumptions to fill the data gaps. With a circumstantial list, the goodness of individual assumptions becomes untestable, given their large number. I conjecture that the collection of assumptions is somewhat dictated by the need or by the lack of information. At times, though, better one more blank than a falsely reassuring datum/fact which is not.

I would in any case recommend being cautious in extrapolating US data to the EU setting to fill the data gaps. Here expert interviews may indeed help, though.

The English is usually good or very good with some inaccuracy episode.

Section 2.1 should be at least edited in various bits and pieces. E.g. “Micro-economic models are more appropriate to predict the impact of both policy changes.” Why? The important thing is to create a quasi-experimental setting. If this is done, macro data will do as well as micro data. And without a quasi-experimental setting, even micro data will fail.

“and of emerging markets such as data market at aggregate level” Unclear sentence.

“One of the reasons is that it is still not clear how to assess the net effects of innovation paths in the economic system.” Unclear sentence

p.15 “at the very last classification level” should perhaps be “lowest” rather than last.

INDICATOR 2 and 3 Number and revenues of data companies

What is your opinion on the methodological approach for Indicator 2 and 3? Is it appropriate, coherent, relevant, credible and well presented?

Definition and statistical reference

The decision of leaving the media and publishing industry out of the picture seems to me appropriate

The definition of “datavore firm” seems elusive, pretty much as any definition of ICT-intensive firm or industry used to be.

Description of the indicator

The choices that were made make sense to me

Main data sources

The choices that were made are careful as far as I can judge

Gap analysis

Data availability is much more an issue with indicator 3 than with indicator 1 and 2

Measurement approach/field research/interviews

Under the constraints spelled out at p. 28-29 (S-D at times hard to define separately) the choice being made here looks sensible to me. “The indicator of data companies will include only the supply-side data companies (as defined above), excluding the intensive data users (the “datavores”), even if they sell their own data, because their core business is not in the data market. In other words we aim at measuring the actors of the data industry.

107

Page 108: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Pending data collection through the indicated surveys, the list of assumptions to be made here looks shorter and simpler than with indicator 1

Are you aware of any further data or literature that you suggest to include?

No

Recommendations for revision - further development of the Indicator:

It appears that the missing revenue and productivity data will have to be provided with separate surveys and fresh data collection whose design is exemplified or described in detail at pp.37-39 or they will not be provided at all. If any company has this expertise, IDC is the one. Given the constraints, I see no alternative to this choice.

Extrapolations of any country finding to other unobserved countries is subject to the usual caveats.

INDICATOR 4 Size of the data market

What is your opinion on the methodological approach for Indicator 4? Is it appropriate, coherent, relevant, credible and well presented?

Definition and statistical reference

Description of the indicator

Main data sources

Gap analysis

Measurement approach/field research/interviews

Indicator 4.1 the problem is well posed, not just for current purposes but for future ones. As discussed at p. 46, imports and exports may not be very important for data related services today, but they may become so in the future. It is thus entirely appropriate to provide an accounting and data collection framework to face future needs.

Indicator 4.2-4.3. Value of the data economy. Very desirable to calculate. Impact presumably small given the relatively small size of the data industry. Still worthwhile doing, with an eye to capturing future industry developments.

Are you aware of any further data or literature that you suggest to include?

I wonder the joint and individual work carried out by Harald Gruber and Pantelis Koutroumpis (http://www.imperial.ac.uk/AP/faces/pages/read/Home.jsp?person=p.koutroumpis&_adf.ctrl-state=jmrezclnl_3) on the diffusion of internet services and mobile phones could be useful. Also for scenario purposes.

Recommendations for revision - further development of the Indicator:

I tend to look at multipliers as fragile objects. So envisaging some sensitivity analysis of their results will be useful (although not for current purposes)

Section 2.4 is particularly well organized and written.

108

Page 109: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

INDICATOR 5 Data workers skills gap

What is your opinion on the methodological approach for Indicator 5? Is it appropriate, coherent, relevant, credible and well presented?

Definition and statistical reference

Description of the indicator

Main data sources

Gap analysis

Measurement approach/field research/interviews

Most details for this indicator come from previous parts of the report.

The missing piece of information for calculating the skills gap is vacancies which should come from the survey on data suppliers

How vacancies will be estimated in the user data companies is left vague, presumably to be validated by talking to field experts.

Are you aware of any further data or literature that you suggest to include?

No

Recommendations for revision - further development of the Indicator:

I see no changes to be made.

Indicator 6 Citizen’s reliance on the data market

What is your opinion on the methodological approach for Indicator 6? Is it appropriate, coherent, relevant, credible and well presented?

Definition and statistical reference

Description of the indicator

Main data sources

Gap analysis

Measurement approach/field research/interviews

If this is to collect data on usage, then this should be conceptually straightforward. Yet, as rightly emphasized in the report, it may be empirically difficult to find data on the answers to the specific questions raised here

Are you aware of any further data or literature that you suggest to include?

No

109

Page 110: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Recommendations for revision - further development of the Indicator:

This task seems to me very hard to accomplish. I do not attach a precise empirical meaning to the word “reliance on the data market”, while the meaning of “use” is clear to me. So this distinction should be clarified for it is not obvious.

Chapter 2.8 – Data collection overview of field research

What is your opinion on this methodology? Is it appropriate, coherent, relevant, credible and well presented?

The proposed surveys (if feasible) and interviews seem appropriate for the purpose

Are you aware of any further data or literature that you suggest to include?

No

Recommendations for revision - further development of the Chapter

The envisaged enterprise surveys would be important complements to achieve some of the project results

Chapter 2.9 Forecasting indicators

What is your opinion on this paragraph? Is the methodology approach appropriate? Is it coherent, relevant, credible and well presented?

The proposed forecasting methodology is correct and sensible.

Are you aware of any further data or literature that you suggest to include?

No

Recommendations for revision - further development of the Chapter

I would recommend to specify the data source of macroeconomic trends (IMF or else). I would take consensus forecasts, rather than developing project-specific forecasts. After all, the IDC comparative advantage lies in forecasting IT and technological trends, not so much in macro. Developing an individual macro benchmark would simply make scenarios less transparent.

I would also recommend the maximum of transparency in incorporating assumptions about elasticities of the various policy and regulatory trends (including the legal framework) and the outcome variables. The truth is that we don’t know these elasticities. Better stay away as much as possible from the Black Box syndrome by making it clear what is in and what is left out of the picture.

Finally, allow for big standard errors.

Chapter 3 – Design of the Monitoring Tool

110

Page 111: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

What is your opinion on this chapter? Is the methodology approach appropriate? Is it coherent, relevant, credible and well presented?

Are you aware of any further data or literature that you suggest to include?

Recommendations for revision - further development of the Chapter

I have no further recommendations in addition to the ones listed above when making comments on the previous chapter.

111

Page 112: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

Main ReferencesBeyond computation: information technology, organizational transformation and business performance, Brynjolfsson E, Hitt LM. 2000. J. Econ. Perspect. 14:23–48

“Beyond Goods and Services: The (Unmeasured) Rise of the Data-driven Economy”, Dr. M. Mandel (2012)

Big Data, A New World of Opportunities, NESSI White Paper, December 2012

“Big Data, Big Impact: New Possibilities for International Development”, The World Economic Forum (2012)

“Commercial exploitation of Europe’s public sector information”, PIRA Study, Study commissioned by the European Commission, Directorate General for the Information Society, reporting on the commercial value of European PSI (2000)

Data value chain: European strategy, European Commission, Directorate General Communication, Networks, and Technologies, http://ec.europa.eu/dgs/connect/en/content/data-value-chain-european-strategy

Enterprise Knowledge Workers: Understanding risks and opportunities, An Economist Intelligence Unit report sponsored by SAP, 2007

European Big Data Value Strategic Research & Innovation Agenda, European Big Data Value cPPP - Strategic Research and Innovation Agenda - April 2014

“Exploring Data-Driven Innovation as a New Source of Growth: Mapping the Policy Issues raised by Big Data”, OECD Digital Economy Papers (2013), n. 222 – OECD Publishing - http://search.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=DSTI/ICCP%282012%299/FINAL&docLanguage=En

“Exploring the Economics of Personal Data: a Survey of Methodologies for Measuring Monetary Value”, OECD Digital Economy Papers, (2013), n. 220 – OECD Publishing - http://www.oecd-ilibrary.org/docserver/download/5k486qtxldmq.pdf?expires=1397492426&id=id&accname=guest&checksum=8AD63B78366867914348D8B1A82AF27B

Framing a European Partnership for a Big Data Value Ecosystem, Vision for a European Big Data Value Partnership - February 2014

IBM Research, “Global Technology Outlook 2013”

ICT JOBS AND SKILLS New estimates and the work ahead, Working Party on Indicators for the Information Society, Paris, 9-10 December, 2013, OECD Headquarters

ICT, JOBS AND SKILLS PROPOSALS FOR A RESEARCH AGENDA, Paris, OECD Headquarters, 16 June 2014, Working Party on Measurement and Analysis of the Digital Economy

IDC, "IDC Predictions 2013: Competing on the Third Platform", www.idc.com

IDC, “A 3x3 Opportunity Matrix for Big Data and Analytics”, IDC #247766 (2014)

IDC, Big Data Adoption And Future Developments In Western European Vertical Markets, IDC Opinion #M08V (2013)

IDC, “Big Opportunities and Big Challenges: Recommendations for Succeeding the Multibillion- Dollar Big Data Market”, IDC #237885 (2012)

IDC, Big Data Drivers, Barriers, and Key Use Cases in Western European Vertical Markets, IDC Opinion #M01W (2014)

112

Page 113: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

IDC, “China Big Data Technology and Services 2012-2016 Forecast and Analysis”, IDC #CN2670201U (2012)

IDC, “IDC Maturity Model: Big Data and Analytics – A Guide to Unlocking Information Access”, IDC #239771 (2013)

IDC, The Future of Big Data and Analytics – Presentation by H. Morris, IDC Web Conference on May 22 2014

IDC, Western Europe 3rd Platform Solutions by Vertical Market, 2014: The Future Is Now, Presentation by A. Siviero, IDC

IDC, Worldwide Big Data Technology and Services 2012-2015 Forecast, IDC #233485 (2012)

IDC, Worldwide Big Data Technology and Services 2013–2017 Forecast, IDC #244979 (2013)

IDC, IDC's Worldwide Storage and Big Data Taxonomy, IDC Opinion (2014) #245938

Information Rules: A Strategic Guide to the Network Economy. Shapiro C, Varian HR. 1999 Boston, MA: Harvard Bus. Sch. Press

“Information technology as a factor of production: the role of differences among firms”, Brynjolfsson E, Hitt LM. 1995, Econ. Innov. New, Technol. 3:183–200

Mc Kinsey Global Institute, “Disruptive technologies: Advances that will transform life, business, and the global economy” (2013)

MEASURING BIG DATA RELATED INDUSTRIES: AN EXPLORATION Issues and ways forward, Working Party on Indicators for the Information Society, 12-14 December 2012 OECD Headquarters, 2 rue André-Pascal, 75016

MEASURING THE DIGITAL ECONOMY: A NEW PERSPECTIVE, Working Party on Measurement and Analysis of the Digital Economy, Paris, OECD Headquarters, 16 June 2014

“Measuring the Internet Economy: A Contribution to the Research Agenda”, OECD Digital Economy Papers (2013), No. 226, OECD Publishing - http://dx.doi.org/10.1787/5k43gjg6r8jf-en

“MEPSIR Measuring European Public Sector Information Resources”, Final Report of Study on Exploitation of public sector information – benchmarking of EU framework conditions, 2006

Micro-foundations for management research: What, why, and whither?, Nicolai J. Foss, 2004

“Quantitative Estimates of the Demand for Cloud Computing in Europe and the Likely Barriers to Up-take”, Study commissioned by the European Commission, Directorate General Communication, Networks, and Technologies to IDC (2012)

“Rise of the Datavores, How UK Business Analyse and Use Online Data”, Hasan Bakhshi and Juan Mateos–Garcia for Nesta, 2012

“Study on e-Procurement Measurement and Benchmarking, Lot 1 – Public Procurement Performance Indicators” MARKT 2011/097/C, Study commissioned by the European Commission, Directorate General Internal Market and Services to IDC (2013)

The Big Data Talent Gap, Kenan Flagler Business School, 2013

“The Human Side of Big Data and High-Performance Analytics”, Research Report, T. H. Davenport, International Institute for Analytics (2012)

“The Knowledge Economy, Paper for the Annual Review of Sociology” , W. W. Powell, School of Education and Department of Sociology, Stanford University, Stanford, California, USA (2004), K. Snellman, Santa Fe Institute, Santa Fe, New Mexico, USA -http://www.stanford.edu/group/song/papers/powell_snellman.pdf

“The Socio-economic impacts of the Future Internet PPP in Europe”, http://www.fi3p.eu/

113

Page 114: General Invitation ToTenderdatalandscape.eu/.../files/report/...25_3_1_final_gm_EC_02…  · Web viewEuropean Data Market. SMART 2013/0063. D2 - Methodology Report. FINAL. 2nd August,

114