data streaming with apache kafka & mongodb

50
Data Streaming with Apache Kafka & MongoDB Andrew Morgan – MongoDB Product Marketing David Tucker – Director, Partner Engineering and Alliances at Confluent 13 th September 2016

Upload: confluent

Post on 14-Apr-2017

768 views

Category:

Software


5 download

TRANSCRIPT

Data Streaming with Apache Kafka &

MongoDB

Andrew Morgan – MongoDB Product MarketingDavid Tucker – Director, Partner Engineering

and Alliances at Confluent

13th September 2016

Agenda

Target Audience

Apache Kafka

MongoDB

Integrating MongoDB and Kafka

Kafka – What’s Next

Next Steps

Target Audience

Target Audience

Target Audience

Target Audience

Target Audience

Target Audience

Apache Kafka / Confluent Platform

What does Kafka do?

Producers

Consumers

Kafka Connect

Kafka Connect

Topic

Your interfaces to the world

Connected to your systems in real time

What is Streaming Data

Synchronous Req/Response0 – 100s ms

Near Real Time> 100s ms

Offline Batch> 1 hour

KAFKAStream Data Platform

Search

RDBMS

Apps Monitoring

Real-time AnalyticsNoSQL Stream Processing

HADOOPData Lake

Impala

DWH

Hive

Spark Map-Reduce

Confluent’s OfferingsCore

Connect

Streams

Java Client

Kafka

Confluent Platform EnterpriseConfluent Platform

Stream MonitoringMore Clients

Message DeliveryREST Proxy

Stream MonitoringSchema Registry

Connector ManagementPre-Built Connectors

Confluent Platform: It’s Kafka ++Feature Benefit Apache Kafka Confluent Platform Confluent Platform

Enterprise

Apache Kafka High throughput, low latency, high availability, secure distributed message system

Kafka Connect Advanced framework for connecting external sources/destinations into Kafka

Java Client Provides easy integration into Java applications

Kafka Streams Simple library that enables streaming application development within the Kafka framework

Additional Clients Supports non-Java clients; C, C++, Python, etc.

REST Proxy Provides universal access to Kafka from any network connected device via HTTP

Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable

Pre-Built Connectors HDFS, JDBC and other connectors fully Certified and fully supported by Confluent

Confluent Control Center Includes Connector Management and Stream Monitoring

Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365

Free Free Subscription

Common Kafka Use Cases

Data transport and integration• Log data• Database changes• Sensors and device data• Monitoring streams• Call data records• Stock ticker data

Real-time stream processing• Monitoring• Asynchronous applications• Fraud and security

People Using Kafka TodayFinancial Services

Entertainment & Media

Consumer Tech

Travel & Leisure

Enterprise Tech

Telecom Retail

MongoDB

Relational

Expressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

The World Has ChangedData Risk

Time Cost

NoSQL

Scalability& Performance

Always On,Global Deployments

FlexibilityExpressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

Nexus Architecture

Scalability& Performance

Always On,Global Deployments

FlexibilityExpressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

Integrating MongoDB and Kafka

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

Take Action

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

Operational Database

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

Reference Data

Where K-Streams Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

Reference Data

Kafka Streams

MongoDB As a Kafka Producer

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing Frameworks

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake

Kafka Streams

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data LakeConfigure where to land incoming data

Distributed Processing Frameworks

Kafka Streams

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake

Raw data processed to generate analytics models

Distributed Processing Frameworks

Kafka Streams

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data LakeMongoDB exposes analytics models to operational apps. Handles real time

updates

Distributed Processing Frameworks

Kafka Streams

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data LakeCompute new

models against MongoDB &

HDFS

Distributed Processing Frameworks

Kafka Streams

Kafka – What’s Next

Kafka Connectors• Confluent-supported connectors (included in CP)

• Community-written connectors (just a sampling)

JDBC

Kafka Futures• Apache Core

• Admin API (KIP-4)• Exactly-once delivery semantics• Time-based topic indexing

• Kafka Streams• Exactly-once processing semantics• Interactive Queries: enable real-time sharing of application state with

other applications• Confluent Platform Enterprise

• Multi-cluster views and alerting added to Control Center

Next Steps

MongoDB AtlasDatabase as a service for MongoDB

MongoDB Atlas is…

• Automated: The easiest way to build, launch, and scale apps on MongoDB

• Flexible: The only database as a service with all you need for modern applications

• Secured: Multiple levels of security available to give you peace of mind

• Scalable: Deliver massive scalability with zero downtime as you grow

• Highly available: Your deployments are fault-tolerant and self-healing by default

• High performance: The performance you need for your most demanding workloads

MongoDB Atlas Features

• Spin up a cluster in seconds

• Replicated & always-on deployments

• Fully elastic: scale out or up in a few clicks with zero downtime

• Automatic patches & simplified upgrades for the newest MongoDB features

• Authenticated & encrypted

• Continuous backup with point-in-time recovery

• Fine-grained monitoring & custom alerts

Safe & SecureRun for You

• On-demand pricing model; billed by the hour

• Multi-cloud support (AWS available with others coming soon)

• Part of a suite of products & services designed for all phases of your app; migrate easily to different environments (private cloud, on-prem, etc) when needed

No Lock-In

Database as a service for MongoDB

MongoDB Enterprise Advanced

• MongoDB Ops Manager or MongoDB Cloud Manager Premium

• MongoDB Compass

• MongoDB Connector for BI

• Encrypted Storage Engine

• LDAP / Kerberos Integration

• DDL & DML Auditing

• FIPS 140-2 Support

SecurityTooling

• 24 x 7 Support

• 1 hr SLA

• Emergency Patches

• Customer Success Program

• On-Demand Training

Support License

• Commercial License

Old Billingsgate, London15th November

mongodb.com/europe

Use my discount code for 20% off: andrewmorgan20

Document Data Model Relational MongoDB

{ customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [{

number : “1-212-777-1212”,

dnc : true,type : “home”

},number : “1-212-777-

1213”, type : “cell”

}] }

Customer ID First Name Last Name City

0 John Doe New York

1 Mark Smith San Francisco

2 Jay Black Newark

3 Meagan White London

4 Edward Daniels Boston

Phone Number Type DNC Customer ID

1-212-555-1212 home T 0

1-212-555-1213 home T 0

1-212-555-1214 cell F 0

1-212-777-1212 home T 1

1-212-777-1213 cell (null) 1

1-212-888-1212 home F 2

Document Model Benefits{

customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [{

number : “1-212-777-1212”,

dnc : true,type : “home”

},number : “1-212-

777-1213”, type : “cell”

}] }

Agility and flexibility

Data model supports business change

Rapidly iterate to meet new requirements

Intuitive, natural data representation

Eliminates ORM layer

Developers are more productive

Reduces the need for joins, disk seeks

Programming is more simple

Performance delivered at scale

Rich FunctionalityMongoDB

Expressive Queries• Find anyone with phone # “1-212…”• Check if the person with number “555…” is on the “do not

call” list

Geospatial • Find the best offer for the customer at geo coordinates of 42nd St. and 6th Ave

Text Search • Find all tweets that mention the firm within the last 2 days

Aggregation • Count and sort number of customers by city

Native Binary JSON support

• Add an additional phone number to Mark Smith’s without rewriting the document

• Select just the mobile phone number in the list• Sort on the modified date

{ customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {

number : “1-212-777-1212”,

dnc : true, type : “home”

},{

number : “1-212-777-1213”,

type : “cell”}]

}

Left outer join ($lookup)

• Query for all San Francisco residences, lookup their transactions, and sum the amount by person

MongoDB Technical CapabilitiesApplication

Driver

Mongos

Primary

Secondary

Secondary

Shard 1

Primary

Secondary

Secondary

Shard 2

…Primary

Secondary

Secondary

Shard N

db.customer.insert({…})db.customer.find({ name: ”John Smith”})

1. Dynamic Document Schema{ name: “John Smith”, date: “2013-08-01”, address: “10 3rd St.”, phone: {

home: 1234567890, mobile: 1234568138 } }

2. Native language drivers

4. High performance- Data locality- Indexes- RAM

3. High availability- Replica sets

5. Horizontal scalability- Sharding

… …

MongoDB Use CasesSingle View Internet of Things Mobile Real-Time Analytics

Catalog Personalization Content Management