in-memory datagrids (with examples of hazelcast)

41
SUPERCOMPUTING FOR THE REST OF US IN-MEMORY DATA-GRIDS * RALPH WINZINGER, SENACOR TECHNOLOGIES

Upload: ralph-winzinger

Post on 12-Apr-2017

738 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: In-Memory Datagrids (with examples of Hazelcast)

SUPERCOMPUTING FOR THE REST OF US

IN-MEMORY DATA-GRIDS

*

RALPH WINZINGER, SENACOR TECHNOLOGIES

Page 2: In-Memory Datagrids (with examples of Hazelcast)

RALPH WINZINGERWHO AM I?

• Senior Technical Leader @ Senacor Technologies

• evaluating new technologies

• panning of Senacor-internal academy

• hacker :-)

• Senacor Technologies, several offices in Germany

• partner for large IT transformations @ big brands

• finance, logistics, industry, insurance, government

Page 3: In-Memory Datagrids (with examples of Hazelcast)

• nature of applications is changing

• increased amount of connectivity and communication

• increased amount of data collected from sensors and apps

• expectations of customers are changing

• services have to be available at any time

• responses need to be delivered immediately

NEW CHALLENGES DUE TO DIGITIZATIONPHYSICAL MEETS DIGITAL

Page 4: In-Memory Datagrids (with examples of Hazelcast)

NEW CHALLENGES DUE TO DIGITIZATIONPHYSICAL MEETS DIGITAL

MORE DATA

MORE REQUESTS

HIGHER AVAILABILITY

HIGHER PERFORMANCE

?

Page 5: In-Memory Datagrids (with examples of Hazelcast)

Web 2.0

MASSIVE PARALLEL APPROACHDOES IT SCALE?

20032004

20062007

2009

Google Distributed Filesystem

Google Map Reduce

Google Big Table

Amazon Dynamo

Facebook Cassandra

2000

2010

Page 6: In-Memory Datagrids (with examples of Hazelcast)

Web 2.0

MASSIVE PARALLEL APPROACHDOES IT SCALE?

20032004

20062007

2009

Google Distributed Filesystem

Google Map Reduce

Google Big Table

Amazon Dynamo

Facebook Cassandra

scaling to billions of requests per day

with commodity hardware

- scaling out -

2000

2010

Page 7: In-Memory Datagrids (with examples of Hazelcast)

SCALE UP VS. SCALE OUTDOES IT SCALE?

• scaling up is easy but surely expensive

• every piece of technology has upper limits

• scaling out is cheap but has certain drawbacks

• clustering is commodity for many years now, but primary addresses logic, not data

• synchronization issues

Page 8: In-Memory Datagrids (with examples of Hazelcast)

EVOLUTION OF PERFORMANCE AND PRICINGCAPABILITIES & COSTS

NET

WO

RK L

ATE

NC

Y

MEM

ORY

CA

PAC

ITY

Page 9: In-Memory Datagrids (with examples of Hazelcast)

EVOLUTION OF PERFORMANCE AND PRICINGCAPABILITIES & COSTS

NET

WO

RK L

ATE

NC

Y

MEM

ORY

CA

PAC

ITY

Price

Page 10: In-Memory Datagrids (with examples of Hazelcast)

EVOLUTION OF PERFORMANCE AND PRICINGCAPABILITIES & COSTS

NET

WO

RK L

ATE

NC

Y

MEM

ORY

CA

PAC

ITY

Price

THIS IS THE BASE FOR IN-MEMORY DATA GRIDS

Page 11: In-Memory Datagrids (with examples of Hazelcast)

IN-MEMORY DATA -GRIDSJUST KEEP IT IN MIND

• IN-MEMORY DATA

• all data needed is supposed to be kept in memory

• HEAP / RAM is becoming a first class citizen

• GRID

• too big for one node, so data is distributed in cluster

• already a couple of players out there

• Hazelcast, Oracle, Terracotta, Infinispan, GridGain, …

Page 12: In-Memory Datagrids (with examples of Hazelcast)

IN-MEMORY DATA -GRIDSJUST KEEP IT IN MIND

• IN-MEMORY DATA

• all data needed is supposed to be kept in memory

• HEAP / RAM is becoming a first class citizen

• GRID

• too big for one node, so data is distributed in cluster

• already a couple of players out there

• Hazelcast, Oracle, Terracotta, Infinispan, GridGain, …

MEMORY X1 NETWORK X100 DISK X1000

Page 13: In-Memory Datagrids (with examples of Hazelcast)

IN-MEMORY DATA -GRIDSJUST KEEP IT IN MIND

• IN-MEMORY DATA

• all data needed is supposed to be kept in memory

• HEAP / RAM is becoming a first class citizen

• GRID

• too big for one node, so data is distributed in cluster

• already a couple of players out there

• Hazelcast, Oracle, Terracotta, Infinispan, GridGain, …

MEMORY X1 NETWORK X100 DISK X1000

Page 14: In-Memory Datagrids (with examples of Hazelcast)

EMBEDDED OR CLIENT/SERVER IMDG APPROACHMAKI AND NIGIRI

• embedded IMDG

• every node-instance of an app is contributing to overall memory

• client / server

• dedicated memory cluster, apart form application

Page 15: In-Memory Datagrids (with examples of Hazelcast)

EMBEDDED OR CLIENT/SERVER IMDG APPROACHMAKI AND NIGIRI

NODE1 NODE3NODE2 NODE4

MEMORY MEMORY MEMORY MEMORY

APP APP APP APP

NODE1 NODE3NODE2 NODE4

APP APP APP APP

MEMORY MEMORY MEMORY MEMORYMEMORY MEMORYMEMORY

Page 16: In-Memory Datagrids (with examples of Hazelcast)

DISTRIBUTED DATA AND THE CAP THEOREM… GO, CHOOSE TWO OF THEM!

or even better: „drop one of them“

Actually no choice - as long as we are in a network

Use a quorum - if there are enough nodes with the same data, that is the truth. Might get expensive

Tolerate a „split brain“ and keep on working. Might get hard to merge

P

C

A

C

PA

Page 17: In-Memory Datagrids (with examples of Hazelcast)

DISTRIBUTED DATA AND THE CAP THEOREM… GO, CHOOSE TWO OF THEM!

or even better: „drop one of them“

Actually no choice - as long as we are in a network

Use a quorum - if there are enough nodes with the same data, that is the truth. Might get expensive

Tolerate a „split brain“ and keep on working. Might get hard to merge

P

C

A

C

PA

Page 18: In-Memory Datagrids (with examples of Hazelcast)

HIGH DENSITY DATAHONEY, I SHRUNK THE DATA

• serialization has massive impact on

• performance - how fast can be de-/serialized?

• throughput - how big is data on the wire?

• volume - how much data can be put in memory?

• go & compare Java, XML, JSON, Protobuf, Capnproto, Thrift, …

• … and be suprised!

• hypercast = hazelcast +c24 preon

Page 19: In-Memory Datagrids (with examples of Hazelcast)

OFF-HEAP MEMORYLEAVING THE SANDBOX

• IMDGs keep lots of data in memory - say hello to our friend, the garbage collector!

• organizational overhead will be present if millions of objects are stored on the heap

• tuning and deep understanding garbage collection is mandatory

• off-heap memory to the rescue

• data is not stored on the heap but in explicitly allocated areas

• IMDG is responsible for deallocating memory

Page 20: In-Memory Datagrids (with examples of Hazelcast)

OFF-HEAP MEMORYLEAVING THE SANDBOX

• IMDGs keep lots of data in memory - say hello to our friend, the garbage collector!

• organizational overhead will be present if millions of objects are stored on the heap

• tuning and deep understanding garbage collection is mandatory

• off-heap memory to the rescue

• data is not stored on the heap but in explicitly allocated areas

• IMDG is responsible for deallocating memory

java.misc.Unsafe

Page 21: In-Memory Datagrids (with examples of Hazelcast)

OFF-HEAP MEMORYLEAVING THE SANDBOX

• IMDGs keep lots of data in memory - say hello to our friend, the garbage collector!

• organizational overhead will be present if millions of objects are stored on the heap

• tuning and deep understanding garbage collection is mandatory

• off-heap memory to the rescue

• data is not stored on the heap but in explicitly allocated areas

• IMDG is responsible for deallocating memory

java.misc.Unsafe

Page 22: In-Memory Datagrids (with examples of Hazelcast)

DATA SHARDING & ELASTICITYWHERE DID IT GO?

• scaling out with distributed data only makes sense when data is partitioned - how to find the right partition?

• an IMDG is quite close to a HashMap - partitions are buckets

• partitionID = hashcode() % num_partions

• now think of a distributed HashMap - partitions are scattered over our cluster

NODE 1

1

2

3

NODE 2

4

5

6

NODE 3

7

8

9

NODE N

P-2

P-1

P

Page 23: In-Memory Datagrids (with examples of Hazelcast)

DATA SHARDING & ELASTICITYWHERE DID IT GO?

• scaling out with distributed data only makes sense when data is partitioned - how to find the right partition?

• an IMDG is quite close to a HashMap - partitions are buckets

• partitionID = hashcode() % num_partions

• now think of a distributed HashMap - partitions are scattered over our cluster

NODE 1

1

2

3

NODE 2

4

5

6

NODE 3

7

8

9

NODE N

P-5

P-4

P-3

NODE N+1

P-2

P-1

P

Page 24: In-Memory Datagrids (with examples of Hazelcast)

DATA SHARDING & ELASTICITYWHERE DID IT GO?

• scaling out with distributed data only makes sense when data is partitioned - how to find the right partition?

• an IMDG is quite close to a HashMap - partitions are buckets

• partitionID = hashcode() % num_partions

• now think of a distributed HashMap - partitions are scattered over our cluster

NODE 1

1

2

3

NODE 2

4

5

6

NODE 3

7

8

9

NODE N

P-8

P-7

P-6

NODE N+1

P-5

P-4

P-3

EC3

P-2

P-1

P

Page 25: In-Memory Datagrids (with examples of Hazelcast)

ELDEST MEMBER VS. CENTRAL MANAGEMENTHAVING A PARTY

• there is no central management instance in a (Hazelcast) IMDG cluster, no single point of failure

• autodiscovery via network broadcast

• there is always one node which knows all other members - like the first person on a party which gets introduced to all other guest

Page 26: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

Page 27: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

1 2

3

backup partitions

Page 28: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

1

2

3

backup partitions

Page 29: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

1 2

3

backup partitions

Page 30: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

1 2 3backup partitions

Page 31: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

backup partitions

1 2 31 2 31 2

31 2 3

Page 32: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

backup partitions

1 2 31 2 31 2

31 2 3

1 2

3

2 2

3

Page 33: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

backup partitions

1 2 31 2 31 2

31 2 3

1

2

3

2 2

3

Page 34: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

backup partitions

1 2 31 2 31 2

31 2 3

1 2

3

2 2

3

Page 35: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

backup partitions

1 2 31 2 31 2

31 2 3

1 23

2 2

3

Page 36: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

backup partitions

1 2 31 2 31 2

31 2 3

1 23

2 23

Page 37: In-Memory Datagrids (with examples of Hazelcast)

FAILOVERAND IF I PULLED THE PLUG???

• data is not only sharded but also redundant to recover from failing nodes

NODE 1

1 2

3

NODE 2

1 2

3

NODE 3

1 2

3

NODE 4

1 2

3

backup partitions

1 2 31 2 3

1 2 3

1’ 2’3’

1’ 2’

3’3 2 2

Page 38: In-Memory Datagrids (with examples of Hazelcast)

DISTRIBUTED COMPUTING IN AN IMDGDIVIDE AND CONQUER

• reading data from the cluster and processing it is a straightforward approach - but not always clever

• it might also be feasible to send algorithms to the cluster and distribute processing

• MapReduce

• Hazelcast has built-in support for distributed executors

• think of it as serializable Runnables which can be sent and executed on a different node

Page 39: In-Memory Datagrids (with examples of Hazelcast)

HAZELCAST FEATURESNOW, WHAT’S INSIDE?

Page 40: In-Memory Datagrids (with examples of Hazelcast)

CODE DEMOLET’S GET OUR HANDS DIRTY!

• not production grade!

• checkout on github

Page 41: In-Memory Datagrids (with examples of Hazelcast)

THANKS!

@rwinz [email protected] https://github.com/rwinzinger