webinar: data streaming with apache kafka & mongodb

50
#MongoDBWebinar | @mongodb Data Streaming with Apache Kafka & MongoDB Andrew Morgan – MongoDB Product Marketing David Tucker – Director, Partner Engineering and Alliances at Confluent 13 th September 2016

Upload: mongodb

Post on 07-Jan-2017

4.201 views

Category:

Software


2 download

TRANSCRIPT

#MongoDBWebinar | @mongodb

Data Streaming with Apache Kafka &

MongoDB

Andrew Morgan –MongoDB Product Marketing

David Tucker –Director, Partner Engineering and Alliances at Confluent

13th September 2016

#MongoDBWebinar | @mongodb

Agenda

Target Audience

Apache Kafka

MongoDB

Integrating MongoDB and Kafka

Kafka – What’s Next

Next Steps

#MongoDBWebinar | @mongodb

Target Audience

#MongoDBWebinar | @mongodb

Target Audience

#MongoDBWebinar | @mongodb

Target Audience

#MongoDBWebinar | @mongodb

Target Audience

#MongoDBWebinar | @mongodb

Target Audience

#MongoDBWebinar | @mongodb

Target Audience

#MongoDBWebinar | @mongodb

Apache Kafka /Confluent Platform

#MongoDBWebinar | @mongodb

What does Kafkado?

Producers

Consumers

Kafka Connect

Kafka Connect

Topic

Your interfaces to the world

Connected to your systems in real time

#MongoDBWebinar | @mongodb

What is Streaming Data

Synchronous Req/Response0 – 100s ms

Near Real Time> 100s ms

Offline Batch> 1 hour

KAFKAStream Data Platform

Search

RDBMS

Apps Monitoring

Real-time AnalyticsNoSQL Stream Processing

HADOOPData Lake

Impala

DWH

Hive

Spark Map-Reduce

#MongoDBWebinar | @mongodb

Confluent’s OfferingsCore

Connect

Streams

Java Client

Kafka

Confluent Platform EnterpriseConfluent Platform

Stream MonitoringMore Clients

Message DeliveryREST Proxy

Stream MonitoringSchema Registry

Connector ManagementPre-Built Connectors

#MongoDBWebinar | @mongodb

Confluent Platform: It’s Kafka ++Feature Benefit Apache Kafka Confluent Platform Confluent Platform

Enterprise

Apache Kafka High throughput, low latency, high availability, secure distributed message system

Kafka Connect Advanced framework for connecting external sources/destinations into Kafka

Java Client Provides easy integration into Java applications

Kafka Streams Simple library that enables streaming application development within the Kafka framework

Additional Clients Supports non-Java clients; C, C++, Python, etc.

REST Proxy Provides universal access to Kafka from any network connected device via HTTP

Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable

Pre-Built Connectors HDFS, JDBC and other connectors fully Certified and fully supported by Confluent

Confluent Control Center Includes Connector Management and Stream Monitoring

Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365

Free Free Subscription

#MongoDBWebinar | @mongodb

Common Kafka Use Cases

Data transport and integration• Log data• Database changes• Sensors and device data• Monitoring streams• Call data records• Stock ticker data

Real-time stream processing• Monitoring• Asynchronous applications• Fraud and security

#MongoDBWebinar | @mongodb

People Using Kafka TodayFinancial Services

Entertainment & Media

Consumer Tech

Travel & Leisure

Enterprise Tech

Telecom Retail

#MongoDBWebinar | @mongodb

MongoDB

#MongoDBWebinar | @mongodb

Relational

Expressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

#MongoDBWebinar | @mongodb

The World Has ChangedData Risk Time Cost

#MongoDBWebinar | @mongodb

NoSQL

Scalability& Performance

Always On,Global Deployments

FlexibilityExpressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

#MongoDBWebinar | @mongodb

Nexus Architecture

Scalability& Performance

Always On,Global Deployments

FlexibilityExpressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

#MongoDBWebinar | @mongodb

Integrating MongoDB and Kafka

#MongoDBWebinar | @mongodb

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

Take Action

#MongoDBWebinar | @mongodb

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

Operational Database

#MongoDBWebinar | @mongodb

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

#MongoDBWebinar | @mongodb

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

#MongoDBWebinar | @mongodb

Where MongoDB Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

Filter

Filter

Merge534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

Reference Data

#MongoDBWebinar | @mongodb

Where K-Streams Fits

Prod324

123...

Topic A

Prod967

123...

Topic B

534

123...

Topic C

Analyze496

123...

Topic D

TakeAction

StoreResults

KeyEvents

Operational Database

Reference Data

Kafka Streams

#MongoDBWebinar | @mongodb

MongoDB As a Kafka Producer

#MongoDBWebinar | @mongodb

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing Frameworks

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake

Kafka Streams

#MongoDBWebinar | @mongodb

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data LakeConfigure where to land incoming data

Distributed Processing Frameworks

Kafka Streams

#MongoDBWebinar | @mongodb

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake

Raw data processed to generate analytics models

Distributed Processing Frameworks

Kafka Streams

#MongoDBWebinar | @mongodb

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data LakeMongoDB exposes analytics models to operational apps. Handles real time

updates

Distributed Processing Frameworks

Kafka Streams

#MongoDBWebinar | @mongodb

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data LakeCompute new models against

MongoDB & HDFS

Distributed Processing Frameworks

Kafka Streams

#MongoDBWebinar | @mongodb

#MongoDBWebinar | @mongodb

https://www.mongodb.com/presentations/replacing-traditional-technologies-mongodb-single-platform-all-financial-data-ahl

#MongoDBWebinar | @mongodb

http://www.slideshare.net/danharvey/change-data-capture-with-mongodb-and-kafka

#MongoDBWebinar | @mongodb

Kafka – What’s Next

#MongoDBWebinar | @mongodb

Kafka Connectors• Confluent-supported connectors (included in CP)

• Community-written connectors (just a sampling)

JDBC

#MongoDBWebinar | @mongodb

Kafka Futures

• Apache Core• Admin API (KIP-4)• Exactly-once delivery semantics• Time-based topic indexing

• Kafka Streams• Exactly-once processing semantics• Interactive Queries: enable real-time sharing of application state with

other applications• Confluent Platform Enterprise

• Multi-cluster views and alerting added to Control Center

#MongoDBWebinar | @mongodb

Next Steps

#MongoDBWebinar | @mongodb

MongoDB AtlasDatabase as a service for MongoDB

MongoDB Atlas is…

• Automated: The easiest way to build, launch, and scale apps on MongoDB

• Flexible: The only database as a service with all you need for modern applications

• Secured: Multiple levels of security available to give you peace of mind

• Scalable: Deliver massive scalability with zero downtime as you grow

• Highly available: Your deployments are fault-tolerant and self-healing by default

• High performance: The performance you need for your most demanding workloads

#MongoDBWebinar | @mongodb

MongoDB Atlas Features

• Spin up a cluster in seconds

• Replicated & always-on deployments

• Fully elastic: scale out or up in a few clicks with zero downtime

• Automatic patches & simplified upgrades for the newest MongoDB features

• Authenticated & encrypted

• Continuous backup with point-in-time recovery

• Fine-grained monitoring & custom alerts

Safe & SecureRun for You

• On-demand pricing model; billed by the hour

• Multi-cloud support (AWS available with others coming soon)

• Part of a suite of products & services designed for all phases of your app; migrate easily to different environments (private cloud, on-prem, etc) when needed

No Lock-In

Database as a service for MongoDB

#MongoDBWebinar | @mongodb

MongoDB Enterprise Advanced

• MongoDB Ops Manager orMongoDB Cloud Manager Premium

• MongoDB Compass

• MongoDB Connector for BI

• Encrypted Storage Engine

• LDAP / Kerberos Integration

• DDL & DML Auditing

• FIPS 140-2 Support

SecurityTooling

• 24 x 7 Support

• 1 hr SLA

• Emergency Patches

• Customer Success Program

• On-Demand Training

Support License

• Commercial License

#MongoDBWebinar | @mongodb

Resources

• Data Streaming with Apache Kafka & MongoDB• https://www.mongodb.com/collateral/data-streaming-with-apache-

kafka-and-mongodb• Implementing a Kafka Consumer for MongoDB

• https://www.mongodb.com/blog/post/mongodb-and-data-streaming-implementing-a-mongodb-kafka-consumer

• Tailing the Oplog on a sharded MongoDB Cluster• https://www.mongodb.com/blog/post/tailing-mongodb-oplog-sharded-

clusters

#MongoDBWebinar | @mongodb

Old Billingsgate, London15th November

mongodb.com/europe

Use my discount code for 20% off: andrewmorgan20

#MongoDBWebinar | @mongodb

Document Data Model Relational MongoDB

{ customer_id : 1,

first_name : "Mark",

last_name : "Smith",

city : "San Francisco",

phones: [

{

number : “1-212-777-1212”,

dnc : true,

type : “home”

},

number : “1-212-777-1213”,

type : “cell”

}]

}

Customer ID FirstName LastName City

0 John Doe NewYork

1 Mark Smith SanFrancisco

2 Jay Black Newark

3 Meagan White London

4 Edward Daniels Boston

Phone Number Type DNC Customer ID

1-212-555-1212 home T 0

1-212-555-1213 home T 0

1-212-555-1214 cell F 0

1-212-777-1212 home T 1

1-212-777-1213 cell (null) 1

1-212-888-1212 home F 2

#MongoDBWebinar | @mongodb

Document Model Benefits

{customer_id : 1,first_name : "Mark",

last_name : "Smith",city : "San Francisco",phones: [{

number : “1-212-777-1212”,dnc : true,

type : “home”},

number : “1-212-777-1213”, type : “cell”

}] }

Agility and flexibility

Data model supports business change

Rapidly iterate to meet new requirements

Intuitive, natural data representation

Eliminates ORM layer

Developers are more productive

Reduces the need for joins, disk seeks

Programming is more simple

Performance delivered at scale

#MongoDBWebinar | @mongodb

Rich FunctionalityMongoDB

Expressive Queries• Find anyone with phone # “1-212…”• Check if the person with number “555…” is on the “do not

call” list

Geospatial • Find the best offer for the customer at geo coordinates of 42nd

St. and 6th Ave

Text Search • Find all tweets that mention the firm within the last 2 days

Aggregation • Count and sort number of customers by city

Native Binary JSON support

• Add an additional phone number to Mark Smith’s without rewriting the document

• Select just the mobile phone number in the list• Sort on the modified date

{ customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {

number : “1-212-777-1212”,dnc : true,type : “home”

},{

number : “1-212-777-1213”, type : “cell”

}] }

Left outer join ($lookup)

• Query for all San Francisco residences, lookup their transactions, and sum the amount by person

#MongoDBWebinar | @mongodb

MongoDB Technical CapabilitiesApplication

Driver

Mongos

Primary

Secondary

Secondary

Shard1

Primary

Secondary

Secondary

Shard2

…Primary

Secondary

Secondary

ShardN

db.customer.insert({…})db.customer.find({ name: ”John Smith”})

1.DynamicDocumentSchema{ name: “John Smith”,

date: “2013-08-01”,address: “10 3rd St.”,phone: {

home: 1234567890,mobile: 1234568138 }

}

2.Nativelanguagedrivers

4.Highperformance- Datalocality- Indexes- RAM

3.Highavailability- Replicasets

5.Horizontalscalability- Sharding

… …

#MongoDBWebinar | @mongodb

MongoDB Use CasesSingle View Internet of Things Mobile Real-Time Analytics

Catalog Personalization Content Management