apache geode meetup, cork, ireland at cit

40
Introduction to Apache Geode (incubating) Cork, September 2015 Anthony Baker (@metatype) William Markito (@william_markito)

Upload: apache-geode

Post on 15-Apr-2017

611 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Apache Geode Meetup, Cork, Ireland at CIT

Introduction to Apache Geode (incubating)

Cork, September 2015

Anthony Baker (@metatype) William Markito (@william_markito)

Page 2: Apache Geode Meetup, Cork, Ireland at CIT

• Introduction to Geode

• Geode concepts and usage

• The Geode open source project

• StockPrediction demo

Agenda

Page 3: Apache Geode Meetup, Cork, Ireland at CIT

Introduction

Page 4: Apache Geode Meetup, Cork, Ireland at CIT

4

"…an in-memory, distributed database with strong consistency built to support low latency transactional applications at extreme scale.”

Page 5: Apache Geode Meetup, Cork, Ireland at CIT

History

2004 2008 2014

•  Massive increase in data volumes

•  Falling margins per transaction

•  Increasing cost of IT maintenance

•  Need for elasticity in systems

•  Financial Services Providers (every major Wall Street bank)

•  Department of Defense

•  Real Time response needs •  Time to market constraints •  Need for flexible data

models across enterprise •  Distributed development •  Persistence + In-memory

•  Global data visibility needs •  Fast Ingest needs for data •  Need to allow devices to

hook into enterprise data •  Always on

•  Largest travel Portal •  Airlines •  Trade clearing •  Online gambling

•  Largest Telcos •  Large mfrers •  Largest Payroll processor •  Auto insurance giants •  Largest rail systems on

earth

• 1000+ customers in production • Cutting edge use cases

Page 6: Apache Geode Meetup, Cork, Ireland at CIT

Interesting use cases

China RailwayCorporation

5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second

*http://pivotal.io/big-data/pivotal-gemfire

Indian Railways

7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute

Page 7: Apache Geode Meetup, Cork, Ireland at CIT

Interesting use cases

Indian RailwaysChina Railway Corporation

World: ~7,349,000,000

~36% of the world population

Population: 1,251,695,6161,401,586,609

Page 8: Apache Geode Meetup, Cork, Ireland at CIT

8

Application Patterns

• Caching for speed and scale ➡ Read-Through, Write-Through, Write-Behind

• As the OLTP system of record ➡ Data in-memory for low-latency, on disk for durability

• Parallel compute engine • Real-time analytics

Page 9: Apache Geode Meetup, Cork, Ireland at CIT

Concepts

Page 10: Apache Geode Meetup, Cork, Ireland at CIT

• Cache

• Region

• Member

• Client Cache

• Functions

• Listeners

Geode Concepts and Usage

Page 11: Apache Geode Meetup, Cork, Ireland at CIT

• Cache

• In-memory storage and management for your data

• Configurable through XML, Spring, Java API or CLI

• Collection of Region

Region

Region

Region

Cache

JVM

Concepts

Page 12: Apache Geode Meetup, Cork, Ireland at CIT

Concepts

• Region

• Distributed java.util.Map on steroids (Key/Value)

• Consistent API regardless of where or how data is stored

• Observable (reactive)

• Highly available, redundant on cache Member (s).

• Querying

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Page 13: Apache Geode Meetup, Cork, Ireland at CIT

Concepts

• Region

• Local, Replicated or Partitioned

• In-memory or persistent

• Redundant

• LRU

• Overflow

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

Page 14: Apache Geode Meetup, Cork, Ireland at CIT

• Persistent Regions

• Durability

• WAL for efficient writing

• Consistent recovery

• Compaction

Concepts

Modify k1->v5

Create k6->v6

Create k2->v2

Create k4->v4 Oplog2.crf

Member 1

Modify k4->v7 Oplog3.crf

Put k4->v7

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Server 1 Server N

Page 15: Apache Geode Meetup, Cork, Ireland at CIT

• Member

• A process that has a connection to the system

• A process that has created a cache

• Embeddable within your application

Concepts

Client

Locator

Server

Page 16: Apache Geode Meetup, Cork, Ireland at CIT

Concepts

• Client cache

• A process connected to the Geode server(s)

• Can have a local copy of the data

• Can be notified about events on the servers

Application

GemFire Server

Region

Region

Region Client Cache

Page 17: Apache Geode Meetup, Cork, Ireland at CIT

Concepts

• Functions

• Used for distributed concurrent processing (Map/Reduce, stored procedure)

• Highly available

• Data oriented

• Member oriented

Submit (f1)

f1 , f2 , … fn

Execute Functions

Page 18: Apache Geode Meetup, Cork, Ireland at CIT

Concepts

• Functions

Server

Server

FunctionService.onRegion.withFilter.execute ResultCollector.getResult

Server Distributed System

execute

Server

Server

6

1

result

execute

execute

result result

2

5

3

4 3 4

Server

Partitioned Region Data Store - X

Partitioned Region Data Store - Y

Partitioned Region Data Store - Z

Partitioned Region Data Accessor

Partitioned Region Data Accessor

filter = Keys X, Y Client Region

Page 19: Apache Geode Meetup, Cork, Ireland at CIT

Concepts

• Listeners

• CacheWriter / CacheListener

• AsyncEventListener (queue / batch)

• Parallel or Serial

• Conflation

Page 20: Apache Geode Meetup, Cork, Ireland at CIT

Core Beliefs

Performance

Consistency

Resiliency

Page 21: Apache Geode Meetup, Cork, Ireland at CIT

Why Apache Geode?

© Copyright 2014 Pivotal. All rights reserved.

Pivotal GemFire High Availability and Fault Tolerance in 6 acts

Failing data copies are replaced transparently

Data is replicated to other clusters and sites (WAN)

Network segmentations are identified and fixed automatically

Client and cluster disconnections are handled gracefully

Data is persisted on local disk for ultimate durability

“split brain”

Failed function executions are restarted automatically

restart

Page 22: Apache Geode Meetup, Cork, Ireland at CIT

• Minimize copying

• Minimize contention points

• Flexible consistency model

• Partitioning and parallelism

• Avoid disk seeks

What makes it fast?

Page 23: Apache Geode Meetup, Cork, Ireland at CIT

Benchmarks and Testing

0

2

4

6

8

10

12

14

16

18

0

1

2

3

4

5

6

2 4 6 8 10

Spee

dup

Server(Hosts

speedup

latency4(ms)

CPU4%

Page 24: Apache Geode Meetup, Cork, Ireland at CIT

The Geode Project

Page 25: Apache Geode Meetup, Cork, Ireland at CIT

Why Open Source? Why ASF?

• Open source is fundamentally changing software buying patterns

• Customers get transparency and co-development of features

• It’s the community that matters

• ASF provides a framework for open source

Page 26: Apache Geode Meetup, Cork, Ireland at CIT

Geode Will Be a Significant Apache Project

• 1M+ LOC, over a 1000 person years invested into cutting edge R&D

• Thousands of production customers in very demanding verticals

• Cutting edge use cases that have shaped product thinking

• A core technology team that has stayed together since founding

• Performance differentiators that are baked into every aspect of the product

Page 27: Apache Geode Meetup, Cork, Ireland at CIT

Geode versus GemFire

• Geode is a project supported by the OSS community

• GemFire is product from Pivotal, based on Geode source

• We donated everything but the kitchen sink*

• Development process follows “The Apache Way”

* Multi-site WAN replication, continuous queries, and native (C/C++/.NET) client

Page 28: Apache Geode Meetup, Cork, Ireland at CIT

"Talk is cheap, show me the code"

• Clone & Build

gitclonehttps://github.com/apache/incubator-geodecdincubator-geode./gradlewbuild-Dskip.tests=true

• Start a server

cdgemfire-assembly/build/install/apache-geode./bin/gfshgfsh>startlocator--name=locatorgfsh>startserver--name=servergfsh>createregion--name=myRegion--type=REPLICATE

Page 29: Apache Geode Meetup, Cork, Ireland at CIT

29

&

Page 30: Apache Geode Meetup, Cork, Ireland at CIT

Stock Predictions with

Apache Geode, Spark, and SpringXD

Page 31: Apache Geode Meetup, Cork, Ireland at CIT

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."

Page 32: Apache Geode Meetup, Cork, Ireland at CIT

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

“A dataframe is a distributed collection of rows organized into named columns. An abstraction for selecting, filtering and plotting structured data (pandas), previously known as SchemaRDD."

Page 33: Apache Geode Meetup, Cork, Ireland at CIT

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

Page 34: Apache Geode Meetup, Cork, Ireland at CIT

Live Data

Apache Geode / GemFire1- Live data is ingested into the grid

2 - Trained ML model compares new data to historical patterns

3 - Results are pushed immediately to deployed applications

Machine Learning model

4 - Re-training is triggered, updating the model with the latest historical data

Spring XD

Spring XD

Data Temperature

Hot

Warm

Page 35: Apache Geode Meetup, Cork, Ireland at CIT

Machine Learning Concepts

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Page 36: Apache Geode Meetup, Cork, Ireland at CIT

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Page 37: Apache Geode Meetup, Cork, Ireland at CIT

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 38: Apache Geode Meetup, Cork, Ireland at CIT

• Off-heap memory storage

• HDFS persistence

• Lucene indexes

• Spark connector

• Cloud Foundry service

• Distributed transactions

…and other ideas from the Geode community!

Roadmap

Page 39: Apache Geode Meetup, Cork, Ireland at CIT

How to Get Involved

• http://geode.incubator.apache.org

• Join the mailing lists; ask a question, answer a question, learn

[email protected] [email protected]

• File a bug in JIRA

• Update the wiki, website, or documentation

• Create example applications

• Use it in your project! We need you!

Page 40: Apache Geode Meetup, Cork, Ireland at CIT

Questions?

Thank you!