video games presentation for policymaking in the big data era conference

18
Using big data to map the UK video games industry Juan Mateos Garcia and Hasan Bakhshi, 16 June 2015

Upload: juan-mateos-garcia

Post on 02-Aug-2015

122 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Using big data to map the UK video games industryJuan Mateos Garcia and Hasan Bakhshi, 16 June 2015

2

Talks about Nesta + Ukie’s research mapping the UK games industry with web (biggish) data.

Focuses on data collection and compares results with what one would obtain using standard (SIC-based) approaches.

Less focused on reviewing all our findings. For that, you can download the full report here:

https://www.nesta.org.uk/sites/default/files/map_uk_games_industry_wv.pdf

This presentation

3

Exam question

To measure and map a fast moving, innovative, entrepreneurial sector.

Opportunity

The ‘big data’ revolution:

Unstructured web inputs

Combining varied datasets

Open, interactive outputs (datasets + platforms)

Audiences:• Policymakers• Industry• Other innovation agents• Researchers

1. Context

4

2. How do we FIND UK games companies?Using official data

Business

Analyst

SIC Code

Data

Govt

Do these SIC codes capture games companies?Some issues:1. Inadequate SIC codes: Games SIC codes only appeared in

2007. 2. Misclassification:

• Companies have no incentives to select the right SIC code.• Companies straddle sectors (educational games, games app

developers etc.)

Is this data relevant?Some issues:1. It misses smaller companies2. Lags in the publication of the data (~1/2 years)3. Data only available in an aggregate way. Not possible to

identify companies (due to disclosure issues)4. Data doesn’t include industry-relevant questions

5

Industry expert

Analyst

Domain knowledge

Survey

Sample

Excellent source of data, tried and tested methodology• Used in many policy-relevant reports.• Allows targeting existing companies, and obtaining

very relevant information.Limitations:• Very expensive• Very low response rates• Snapshot

2. How do we FIND UK games companies?Using surveys

6

Business

Analyst

Activity

Data

Web

Advantages• Definition not based on

SIC codes but on economic/creative activity

• ‘Real-time’ data• Relevant dataNot a silver bullet… as we will see.

2. How do we FIND UK games companies?Using web data (our approach)

7

An illustration of the pitfalls of web data

Several academic papers have used a similar approach, to ours, but based on a single data source (MobyGames). But MobyGames is very skewed towards older, niche gaming platforms vs. new, mainstream ones. This reflects biases in the user-base of the platform.

8

2. How do we FIND UK games companies?Process

Data scraping carried out by external agency with IT + domain expertise. Analysis in-house

9

2. How do we FIND UK games companies?Some observationsNot all observations are born equal: • Matching companies from web sources with CH data is a probabilistic

process.• False positives/negatives costly not just in terms of accuracy, but also of

perceptions.• Strategies to address this:

– Manual (expensive, stringent) verification of companies using web information: only 23% companies verified (80% of those validated were correct):

– Decision tree (CHAID) to identify groups of companies similar to those verified positively: 546 companies added.

– Quality assurance with domain experts (Ukie): • Remove 17 companies (BBC, gambling companies)• Incorporate 184 companies with no web presence.

10

4. ResultsCoverage

We identify 1902 companies active in 2015 (cf. 1320 according to IDBR in 2013, ~500 in most domain-expert generated company lists).

Just over a third of companies covered by official SIC codes.

20% of the companies have no official SIC code yet, but are identified by our approach.

11

4. ResultsSpecialisation profile

Companies targeting emerging platforms (iOS) are less well-covered by games SICs than those targeting established platforms (consoles)

12

4. ResultsGeography [1]

breslq ark.lq idbrcount.lq

breslq 1.00 0.38 0.46

ark.lq 0.38 1.00 0.53

idbrcount.lq 0.46 0.53 1.00

Gini BRES

Gini IDBR Gini WD

0.929 0.898 0.801

Our approach shows a geography of the UK games industry echoing official data sources, but with less concentration

13

4. ResultsGeography [2]

Differences in “hotspots” when we compare our data and IDBR.

Conversations with Ukie suggest the extra hubs identified by our analysis are more credible than those using IDBR (Liverpool + Cardiff vs. Hull + Reading)

14

4. ResultsHub composition

One explanation

“New” games hubs with more diversified creative economies tend to include less companies covered by official SIC codes, compared with “longstanding” hubs.

15

4. ResultsMicro-geography

Our data allows us to map the games industry at the micro (company address) level -> this is policy relevant information.

16

4. ResultsSome issues

Poor availability of financial data (only 6% report it to CH) -> We can’t produce estimates of employment or value added.

We rely on inaccurate trading addresses for our mapping. We know there are issues here.

How many of our companies specialise in games vs. make some games? What goes in and what goes out?

17

Lessons learned

Structured domain-specific resources help: not available for all sectors.

It’s not web vs SIC, but web + SIC

Combining automated data collection and matching with domain knowledge is preferable.

Do not underestimate the risks of errors, or the costs of minimising them.

Next steps

Our strategy to improve the quality of the data is to open it up for the games industry by developing an interactive, dynamic platform: Watch this space

5. Conclusions

THANK YOU: [email protected]@JMateosGarcia