king hug uk

Post on 21-May-2015

3.482 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dr Relational or: How I Learned to Stop Worrying and Love the Database (Andy Done, Data Warehouse Lead, King) In the face of explosive growth King's Hadoop data warehouse simply wasn't scaling fast enough. Find out why King is extending its Big Data platform with MPP database ExaSol and processing its data 100s of times faster.

TRANSCRIPT

© King.com Ltd 2013 – Public 2

Database

Relational

© King.com Ltd 2013 – Public

Agenda

3

•  Welcome! •  A brief history of King •  King data platform evolution

•  Enter Hive •  Hive + DB

•  Hive + better DB

•  Questions?

© King.com Ltd 2013 – Public

A brief history of King

4

© King.com Ltd 2013 – Public

Who?

5

A brief history of King

© King.com Ltd 2013 – Public

Where?

6

A brief history of king

© King.com Ltd 2013 – Public

Web, social, mobile

7

A brief history of King

© King.com Ltd 2013 – Public

King in numbers

8

•  100 million daily active users •  1 billion game plays per day •  8 offices •  10 billion events per day •  Lots and lots of data…

A brief history of King

© King.com Ltd 2013 – Public

A brief history of me andy.done@king.com

9

© King.com Ltd 2013 – Public

King data platform evolution

10

© King.com Ltd 2013 – Public

Enter Hive

11

© King.com Ltd 2013 – Public

The road to big

12

Enter Hive

0

50

100

150

200

250

300

350

2011

-02-

16

2011

-03-

04

2011

-03-

20

2011

-04-

05

2011

-04-

21

2011

-05-

07

2011

-05-

23

2011

-06-

08

2011

-06-

24

2011

-07-

10

2011

-07-

26

2011

-08-

11

2011

-08-

27

2011

-09-

12

2011

-09-

28

2011

-10-

14

2011

-10-

30

2011

-11-

15

2011

-12-

01

2011

-12-

17

2012

-01-

02

2012

-01-

18

2012

-02-

03

2012

-02-

19

2012

-03-

06

2012

-03-

22

2012

-04-

07

2012

-04-

23

2012

-05-

09

2012

-05-

25

2012

-06-

10

2012

-06-

26

2012

-07-

12

2012

-07-

28

2012

-08-

13

2012

-08-

29

2012

-09-

14

2012

-09-

30

2012

-10-

16

2012

-11-

01

2012

-11-

17

2012

-12-

03

2012

-12-

19

2013

-01-

04

2013

-01-

20

2013

-02-

05

2013

-02-

21

2013

-03-

09

2013

-03-

25

2013

-04-

10

2013

-04-

26

Com

pres

sed

even

ts g

igab

ytes

/day

Browser Mobile 40 nodes

Qlikview says no

Infobright CE says no

10 nodes

20 nodes

© King.com Ltd 2013 – Public

Scaling accomplished

13

Enter Hive

© King.com Ltd 2013 – Public

Hive says…

14

Enter Hive

© King.com Ltd 2013 – Public

Data exploration

15

•  COUNT(*) •  SELECT DISTINCT •  COUNT, SUM… GROUP BY date

Enter Hive

© King.com Ltd 2013 – Public

Hive + DB = ?

16

© King.com Ltd 2013 – Public

Data platform 1.0

17

Hive + DB

Games Event data Hive

Reports

Data scientis

ts

ETL

© King.com Ltd 2013 – Public

Data platform 1.5

18

Hive + DB

Games Event data Hive DB

Reports

Data scientis

ts

ETL

© King.com Ltd 2013 – Public

Selection criteria

19

•  ‘Accessible’ pricing (free?) •  Single node •  Easy to set up •  Low maintenance

Hive + DB

© King.com Ltd 2013 – Public

Contenders ready

20

•  Infobright •  Columnar MySql engine •  Light tuning and hinting

•  InfiniDB •  Columnar MySql engine •  Tuning-less

•  Faster for our use case

© King.com Ltd 2013 – Public

How’s that work out?

21

•  Paid its way •  Popular

•  100s queries / day

•  Stability •  Ceilings •  Screwed by mobile

© King.com Ltd 2013 – Public

The road to big

22

Enter Hive

0

50

100

150

200

250

300

350

2011

-02-

16

2011

-03-

04

2011

-03-

20

2011

-04-

05

2011

-04-

21

2011

-05-

07

2011

-05-

23

2011

-06-

08

2011

-06-

24

2011

-07-

10

2011

-07-

26

2011

-08-

11

2011

-08-

27

2011

-09-

12

2011

-09-

28

2011

-10-

14

2011

-10-

30

2011

-11-

15

2011

-12-

01

2011

-12-

17

2012

-01-

02

2012

-01-

18

2012

-02-

03

2012

-02-

19

2012

-03-

06

2012

-03-

22

2012

-04-

07

2012

-04-

23

2012

-05-

09

2012

-05-

25

2012

-06-

10

2012

-06-

26

2012

-07-

12

2012

-07-

28

2012

-08-

13

2012

-08-

29

2012

-09-

14

2012

-09-

30

2012

-10-

16

2012

-11-

01

2012

-11-

17

2012

-12-

03

2012

-12-

19

2013

-01-

04

2013

-01-

20

2013

-02-

05

2013

-02-

21

2013

-03-

09

2013

-03-

25

2013

-04-

10

2013

-04-

26

Com

pres

sed

even

ts g

igab

ytes

/day

Browser Mobile 40 nodes

Qlikview says no

Infobright CE says no

10 nodes

20 nodes

InfiniDB

© King.com Ltd 2013 – Public

ETL?

23

© King.com Ltd 2013 – Public

Hive + better DB = ?

24

© King.com Ltd 2013 – Public

Data platform 2.0

25

Hive + better DB

Game Event data Hive Better

DB

Reports

Data scientis

ts

ETL

© King.com Ltd 2013 – Public

State of the market Jan 2013

26

•  Hadoop on steroids •  Hadapt…

•  Impala

•  Nouvaeu Data •  Platfora •  SIsense

•  MPP analytics databases •  Vertica •  ExaSol

Hive + better DB

© King.com Ltd 2013 – Public

Contenders ready

27

Hive + better DB

Feature ExaSol Vertica Processing In memory Disc optimised Administration Web based Command line Backup Web based Command line Resiliency Hot spare Gradual

degradation Tuning Self tuning User tuning Licensing Allocated RAM Total storage Vendor Smaller Larger

© King.com Ltd 2013 – Public

Disclaimers

28

•  Our data •  Our queries •  Our use case •  Our results

Hive + better DB

© King.com Ltd 2013 – Public

This is our data

29

Hive + better DB

Table Row count Mobile dimension 161 m Social dimension 600 m Mobile facts 1 B Social facts 6.7 B

© King.com Ltd 2013 – Public

Single query

30

Hive + better DB

© King.com Ltd 2013 – Public

Single query

31

Hive + better DB

© King.com Ltd 2013 – Public

Single query

32

Hive + better DB

© King.com Ltd 2013 – Public

Single query

33

Hive + better DB

© King.com Ltd 2013 – Public

Cluster stats

34

Hive + better DB

Vertica ExaSol Hive InfiniDB Nodes 4 4 19 1 Cores 64 48 228 32 RAM 512 Gb 288 Gb 1216 Gb 300 Gb Discs 96 32 76 4 Hardware cost / USD $$$$ $$ $$ $ Total cost / USD $$$$$$ $$$$$ $$ $$

© King.com Ltd 2013 – Public

Concurrency 2

35

Hive + better DB

© King.com Ltd 2013 – Public

Concurrency 4

36

Hive + better DB

© King.com Ltd 2013 – Public

Concurrency 8

37

Hive + better DB

© King.com Ltd 2013 – Public

Concurrency 16

38

Hive + better DB

© King.com Ltd 2013 – Public

Overall run time

39

Hive + better DB

© King.com Ltd 2013 – Public

Picture:words

40

Hive + better DB

$1.9m

=4 ExaSol nodes

420 Hive nodes

© King.com Ltd 2013 – Public

This is a test

41

•  Ad hoc query tests •  DML

•  INSERTs

•  UPDATEs •  DELETEs

Hive + better DB

© King.com Ltd 2013 – Public

And in the real world

42

•  Faster processing times •  4.5 hours to 20 minutes

•  Happier analysts •  Happier data warehouse engineers •  Happier ops

Hive + better DB

© King.com Ltd 2013 – Public

Conclusions

43

•  For structured workloads, consider a good analytic database to complement your Hadoop infrastructure

•  ExaSol was an excellent fit for our use case •  We’ll let you know how we get on!

Hive + better DB

© King.com Ltd 2013 – Public

Questions?

44

© King.com Ltd 2013 – Public

We’re hiring!

45

Thank you

© King.com Ltd 2013 – Public 46

top related