in-memory computing essentials for architects and engineers

Post on 21-Jan-2018

248 Views

Category:

Engineering

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2017 GridGain Systems, Inc.

In-Memory Performance

Durability of Disk

© 2017 GridGain Systems, Inc.

In-Memory Computing Essentials

for Java Developers

Denis MagdaIgnite PMC Chair

GridGain Director of Product Management

© 2017 GridGain Systems, Inc.

• Apache Ignite Overview

• Clustering and Deployment

• Distributed Storage

• Distributed SQL

• Distributed Computations

• Machine Learning

• Memory Architecture & Persistence

Agenda

© 2017 GridGain Systems, Inc.

Apache Ignite In-Memory Computing Platform

Memory-Centric Storage

Ignite Native Persistence(Flash, SSD, Intel 3D XPoint)

Third-Party Persistence(RDBMS, HDFS, NoSQL)

SQL Transactions Compute Services MLStreamingKey/Value

IoTFinancialServices

Pharma &Healthcare

E-CommerceTravel & Logistics

Telco

© 2017 GridGain Systems, Inc.

Clustering and Deployment

© 2017 GridGain Systems, Inc.

Clustering

• Server Nodes

• Act as containers for data and computations

• Generally started as standalone processes

• Client Nodes

• Provide a cluster entry point to run operations

• Embedded in applications code

© 2017 GridGain Systems, Inc.

Deployment

• Nodes are logical entities

• Runs in a JVM process

• Many nodes in a single JVM process

• On-Premise and Cloud

• Physical server or VM

• AWS, Azure, Google Compute Engine

• Kubernetes, Mesos, YARN

© 2017 GridGain Systems, Inc.

Distributed Storage

© 2017 GridGain Systems, Inc.

Distributed Storage

JCache Transactions Compute SQL

RDBMS

NoSQL

HDFS

Server Node

Distributed Key-Value Store

Dynamic

Scaling

Distributed

partitioned

hash map

ACID TransactionJCache & SQL

Server Node Server Node

3rd party storage caching

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

© 2017 GridGain Systems, Inc.

Where Entry Goes?

Ignite Node 1 Ignite Node 2

put (key, value)

? ?

© 2017 GridGain Systems, Inc.

Key to Node Mapping

Key Partition

Server Node

ON-DISK

© 2017 GridGain Systems, Inc.

Caches and Partitions

K1, V1

K2, V2

K3, V3

K4, V4

Partition 1

K5, V5

K6, V6

K7,V7

K8, V8 K9, V9

Partition 2

Cache

© 2017 GridGain Systems, Inc.

Partitions Distribution

Ignite Node 1 Ignite Node 2

0 2 4 6 8

10 12 14

1 3 5 7 9

11 13 15

© 2017 GridGain Systems, Inc.

Where Entry Goes?

Ignite Node 1 Ignite Node 2

put (key, value)

0 2 4 1 3 5

? ?

© 2017 GridGain Systems, Inc.

Where Entry Goes?

Ignite Node 1 Ignite Node 2

put (key, value)

0 2 4 1 3 5

© 2017 GridGain Systems, Inc.

Backup Copies

Ignite Node Ignite Node

Ignite Node Ignite Node

0 1

2 3

© 2017 GridGain Systems, Inc.

Backup Copies

Ignite Node Ignite Node

Ignite Node Ignite Node

0 1

2 3

0

1

2

3

© 2017 GridGain Systems, Inc.

Distributed SQL

© 2017 GridGain Systems, Inc.

Distributed SQL

JDBC ODBC SQL API

Java .NET C++ BI

SELECT, UPDATE,

INSERT, MERGE,

DELETE, CREATE

and ALTER

DDL, DML Support

Cross-platform

Compatibility

Indexes in

RAM or Disk

Dynamic

Scaling

Server Node Server NodeServer Node

Apache Ignite Cluster

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Tools

© 2017 GridGain Systems, Inc.

Connectivity

• JDBC

• ODBC

• REST

• Java, .NET and C++ APIs

// Register JDBC driver.

Class.forName("org.apache.ignite.IgniteJdbcThinDriver");

// Open the JDBC connection.

Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50");

./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/

© 2017 GridGain Systems, Inc.

Data Definition Language

• CREATE/DROP TABLE

• CREATE/DROP INDEX

• ALTER TABLE

• Changes Durability

• Ignite Native Persistence

CREATE TABLE `city` (

`ID` INT(11),

`Name` CHAR(35),

`CountryCode` CHAR(3),

`District` CHAR(20),

`Population` INT(11),

PRIMARY KEY (`ID`, `CountryCode`)

) WITH "template=partitioned, backups=1, affinityKey=CountryCode";

© 2017 GridGain Systems, Inc.

Data Manipulation Language

• ANSI-99 specification

• Fault-tolerant and consistent

• INSERT, UPDATE, DELETE

• SELECT

• JOINs

• Subqueries

SELECT country.name, city.name, MAX(city.population) as max_pop

FROM country JOIN city ON city.countrycode = country.code

WHERE country.code IN ('USA','RUS','CHN')

GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;

© 2017 GridGain Systems, Inc.

Affinity Collocation

Country

Languag

eCity

Server Node

ON-DISK

Server Node

ON-DISK

key (country = 5) 10

Partition

key (cityId = 10, countryId = 5)10

Partition

key (cityId = 11, countryId = 9) 12

Partition

© 2017 GridGain Systems, Inc.

Collocated Joins

1. Initial Query

2. Query execution over local data

3. Reduce multiple results in one

Ignite Node

Canada

Toronto

Ottawa

Montreal

Calgary

Ignite Node

IndiaMumbai

New Delhi

1 SELECT ct.name, c.name

FROM Country as ct

JOIN City as c ON ct.id = c.countryId

WHERE ct.name = “Canada”;

2

23

© 2017 GridGain Systems, Inc.

Non-Collocated Joins

1. Initial Query

2. Query execution (local + remote data)

3. Potential data movement

4. Reduce multiple results in one

Ignite Node

Canad

a

Toronto

Calgary

1 SELECT ct.name, c.name

FROM Country as ct

JOIN City as c ON ct.id = c.countryId

WHERE ct.name = “Canada”;

2

24 Ignite Node

India

Montreal

Ottawa

3Montreal

Ottawa

Mumbai

New Delhi

© 2017 GridGain Systems, Inc.

Distributed Computations

© 2017 GridGain Systems, Inc.

Compute Grid

DURABLE MEMORY

DURABLE MEMORY

Ignite Cluster

C1

R1

C2

R2

C = C1 + C2

R = R1 + R2

C = Compute

R = Result

in T/2 time

Automatic Failover

Load Balancing

Zero Deployment

© 2017 GridGain Systems, Inc.

1. Initial Request

2. Fetch data from remote

nodes

3. Process entire data-set

3

1

Data 12

2 Data 2

Client-Server Processing Co-located Processing

Server Node

ON-DISK

Server Node

ON-DISK

1. Initial Request

2. Co-located processing with

data

3. Reduce multiple results in

one

2

2

1Client Node

Server Node

ON-DISK

Server Node

ON-DISK

Client Node

3

© 2017 GridGain Systems, Inc.

Machine Learning

© 2017 GridGain Systems, Inc.

Genetic Algorithm Grid

DURABLE MEMORY

DURABLE MEMORY

Ignite Cluster

F2, C2, M2

F = F1 + F2

C = C1 + C2

Collocated

Computation

Biological Evolution

SimulationChromosome and Genes Cluster

M = M1 + M2

F1, C1, M1

F = Fitness Calculation

C = Crossover

M = Mutation

© 2017 GridGain Systems, Inc.

Machine Learning Grid

K-Means Regressions Decision Trees

R C++ Python Java

Server Node Server NodeServer Node

Distributed Core Algebra

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Scala REST

Random ForestDistributed Algorithms

Dense and Sparse

Algebra

Large Scale

Parallelization

Multi-Language

Support

Dense and Sparse

Algebra

No ETL

© 2017 GridGain Systems, Inc.

Memory Architecture & Persistence

© 2017 GridGain Systems, Inc.

Durable Memory

Off-heap Removes

noticeable GC

pauses

Automatic

Defragmentation

Stores

Superset of

Data

Predictable memory

consumption

Fully Transactional

(Write-Ahead Log)

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Server Node Server Node Server Node

Ignite Cluster

Instantaneous

Restarts

© 2017 GridGain Systems, Inc.

© 2017 GridGain Systems, Inc.

Regions and Segments

• Memory split into regions

• Regions split into segments

• Segments include pages

© 2017 GridGain Systems, Inc.

B+Tree

• Self-balancing tree

• Memory & Disk

• Sorted Index

• Secondary Indexes

• Hash Index

• Primary Keys

• Hash code based sorting

© 2017 GridGain Systems, Inc.

Free Lists

• Tracks pages of ~ equal free space

• 25% free

• 75% free

• Essential for updates

• Gives page with min size needed

• Reduces fragmentation

• Lowers pages compaction activity

© 2017 GridGain Systems, Inc.

Ignite Native Persistence

1. Update

RA

M

2. Persist

Write-Ahead Log

Partition File 1

3. Ack

4. Checkpointing

Partition File N

Server Node

© 2017 GridGain Systems, Inc.

Any Questions?

Thank you for joining us. Follow the conversation.

http://ignite.apache.org

#apacheignite

#denismagda

top related