apache geode meetup, cork, ireland at cit

Post on 15-Apr-2017

611 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Apache Geode (incubating)

Cork, September 2015

Anthony Baker (@metatype) William Markito (@william_markito)

• Introduction to Geode

• Geode concepts and usage

• The Geode open source project

• StockPrediction demo

Agenda

Introduction

4

"…an in-memory, distributed database with strong consistency built to support low latency transactional applications at extreme scale.”

History

2004 2008 2014

•  Massive increase in data volumes

•  Falling margins per transaction

•  Increasing cost of IT maintenance

•  Need for elasticity in systems

•  Financial Services Providers (every major Wall Street bank)

•  Department of Defense

•  Real Time response needs •  Time to market constraints •  Need for flexible data

models across enterprise •  Distributed development •  Persistence + In-memory

•  Global data visibility needs •  Fast Ingest needs for data •  Need to allow devices to

hook into enterprise data •  Always on

•  Largest travel Portal •  Airlines •  Trade clearing •  Online gambling

•  Largest Telcos •  Large mfrers •  Largest Payroll processor •  Auto insurance giants •  Largest rail systems on

earth

• 1000+ customers in production • Cutting edge use cases

Interesting use cases

China RailwayCorporation

5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second

*http://pivotal.io/big-data/pivotal-gemfire

Indian Railways

7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute

Interesting use cases

Indian RailwaysChina Railway Corporation

World: ~7,349,000,000

~36% of the world population

Population: 1,251,695,6161,401,586,609

8

Application Patterns

• Caching for speed and scale ➡ Read-Through, Write-Through, Write-Behind

• As the OLTP system of record ➡ Data in-memory for low-latency, on disk for durability

• Parallel compute engine • Real-time analytics

Concepts

• Cache

• Region

• Member

• Client Cache

• Functions

• Listeners

Geode Concepts and Usage

• Cache

• In-memory storage and management for your data

• Configurable through XML, Spring, Java API or CLI

• Collection of Region

Region

Region

Region

Cache

JVM

Concepts

Concepts

• Region

• Distributed java.util.Map on steroids (Key/Value)

• Consistent API regardless of where or how data is stored

• Observable (reactive)

• Highly available, redundant on cache Member (s).

• Querying

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Concepts

• Region

• Local, Replicated or Partitioned

• In-memory or persistent

• Redundant

• LRU

• Overflow

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

• Persistent Regions

• Durability

• WAL for efficient writing

• Consistent recovery

• Compaction

Concepts

Modify k1->v5

Create k6->v6

Create k2->v2

Create k4->v4 Oplog2.crf

Member 1

Modify k4->v7 Oplog3.crf

Put k4->v7

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Server 1 Server N

• Member

• A process that has a connection to the system

• A process that has created a cache

• Embeddable within your application

Concepts

Client

Locator

Server

Concepts

• Client cache

• A process connected to the Geode server(s)

• Can have a local copy of the data

• Can be notified about events on the servers

Application

GemFire Server

Region

Region

Region Client Cache

Concepts

• Functions

• Used for distributed concurrent processing (Map/Reduce, stored procedure)

• Highly available

• Data oriented

• Member oriented

Submit (f1)

f1 , f2 , … fn

Execute Functions

Concepts

• Functions

Server

Server

FunctionService.onRegion.withFilter.execute ResultCollector.getResult

Server Distributed System

execute

Server

Server

6

1

result

execute

execute

result result

2

5

3

4 3 4

Server

Partitioned Region Data Store - X

Partitioned Region Data Store - Y

Partitioned Region Data Store - Z

Partitioned Region Data Accessor

Partitioned Region Data Accessor

filter = Keys X, Y Client Region

Concepts

• Listeners

• CacheWriter / CacheListener

• AsyncEventListener (queue / batch)

• Parallel or Serial

• Conflation

Core Beliefs

Performance

Consistency

Resiliency

Why Apache Geode?

© Copyright 2014 Pivotal. All rights reserved.

Pivotal GemFire High Availability and Fault Tolerance in 6 acts

Failing data copies are replaced transparently

Data is replicated to other clusters and sites (WAN)

Network segmentations are identified and fixed automatically

Client and cluster disconnections are handled gracefully

Data is persisted on local disk for ultimate durability

“split brain”

Failed function executions are restarted automatically

restart

• Minimize copying

• Minimize contention points

• Flexible consistency model

• Partitioning and parallelism

• Avoid disk seeks

What makes it fast?

Benchmarks and Testing

0

2

4

6

8

10

12

14

16

18

0

1

2

3

4

5

6

2 4 6 8 10

Spee

dup

Server(Hosts

speedup

latency4(ms)

CPU4%

The Geode Project

Why Open Source? Why ASF?

• Open source is fundamentally changing software buying patterns

• Customers get transparency and co-development of features

• It’s the community that matters

• ASF provides a framework for open source

Geode Will Be a Significant Apache Project

• 1M+ LOC, over a 1000 person years invested into cutting edge R&D

• Thousands of production customers in very demanding verticals

• Cutting edge use cases that have shaped product thinking

• A core technology team that has stayed together since founding

• Performance differentiators that are baked into every aspect of the product

Geode versus GemFire

• Geode is a project supported by the OSS community

• GemFire is product from Pivotal, based on Geode source

• We donated everything but the kitchen sink*

• Development process follows “The Apache Way”

* Multi-site WAN replication, continuous queries, and native (C/C++/.NET) client

"Talk is cheap, show me the code"

• Clone & Build

gitclonehttps://github.com/apache/incubator-geodecdincubator-geode./gradlewbuild-Dskip.tests=true

• Start a server

cdgemfire-assembly/build/install/apache-geode./bin/gfshgfsh>startlocator--name=locatorgfsh>startserver--name=servergfsh>createregion--name=myRegion--type=REPLICATE

29

&

Stock Predictions with

Apache Geode, Spark, and SpringXD

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

“A dataframe is a distributed collection of rows organized into named columns. An abstraction for selecting, filtering and plotting structured data (pandas), previously known as SchemaRDD."

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

Live Data

Apache Geode / GemFire1- Live data is ingested into the grid

2 - Trained ML model compares new data to historical patterns

3 - Results are pushed immediately to deployed applications

Machine Learning model

4 - Re-training is triggered, updating the model with the latest historical data

Spring XD

Spring XD

Data Temperature

Hot

Warm

Machine Learning Concepts

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

• Off-heap memory storage

• HDFS persistence

• Lucene indexes

• Spark connector

• Cloud Foundry service

• Distributed transactions

…and other ideas from the Geode community!

Roadmap

How to Get Involved

• http://geode.incubator.apache.org

• Join the mailing lists; ask a question, answer a question, learn

dev@geode.incubator.apache.org user@geode.incubator.apache.org

• File a bug in JIRA

• Update the wiki, website, or documentation

• Create example applications

• Use it in your project! We need you!

Questions?

Thank you!

top related