webinar: mongodb and hadoop - working together to provide business insights

Post on 26-Jan-2015

110 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Join us for a webinar on how MongoDB and Hadoop can work together to solve Big Data problems in today's enterprises. We will take an in depth look at how the two technologies make real business intelligence accessible to end users. After a brief introduction to both technologies, this webinar will dive deep into the MongoDB+Hadoop Connector and how it is applied to enable new business insights. In this webinar you will learn: What information problems are a good fit for MongoDB and Hadoop How to integrate the two technologies using the MongoDB+Hadoop Connector Programming paradigms for tackling common problems

TRANSCRIPT

MongoDB & Hadoop:Providing Business Insights

Thomas BoydSenior Solutions Architect, MongoDB

2

What is MongoDB?

The leading NoSQL database

Document Database

Open-Source

General Purpose

3

RDBMS

MongoDB Document Model

MongoDB

{

_id : ObjectId("4c4ba5e5e8aabf3"),

employee_name: "Dunham, Justin",

department : "Marketing",

title : "Product Manager, Web",

report_up: "Neray, Graham",

pay_band: “C",

benefits : [

{ type :  "Health",

plan : "PPO Plus" },

{ type :   "Dental",

plan : "Standard" }

]

}

4

What is Hadoop?

“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.”*

*source: hadoop.apache.org

• Large datasets• Analytics• Batch• Map-Reduce

5

Enterprise IT Stack

EDWHadoop

Man

agem

ent

& M

on

ito

rin

gS

ecurity &

Au

ditin

g

RDBMS

CRM, ERP, Collaboration, Mobile, BI

OS & Virtualization, Compute, Storage, Network

RDBMS

Applications

Infrastructure

Data Management

Online Data Offline Data

6

Consideration: Online vs. Offline

• Long-running• High-Latency• Availability is lower

priority

• Real-time• Low-latency• High availability

Online Offlinevs.

7

Consideration: Online vs. Offline

Online Offlinevs.

8

Hadoop is good for…

Risk Modeling Churn AnalysisRecommendation

Engine

Ad TargetingTransaction

AnalysisTrade

Surveillance

Network Failure Prediction

Search Quality Data Lake

9

MongoDB is good for…

360 Degree View of the Customer

Mobile & Social Apps

Fraud Detection

User Data Management

Content Management &

DeliveryReference Data

Product CatalogsMachine to

Machine AppsData Hub

10

MongoDB and Hadoop: Complementary

• “Data Lake”• In-depth analytics

• Real-time systems• Light-weight analytical

workloads

11

Use MongoDB+Hadoop Together

E-Commerce

• Products & Inventory• Real-time

recommendations• Customer profile• Session management• Customer clickstream• Fraud detection

• Transaction history• Clickstream history• Recommendation

model• Fraud modeling

Analysis

MongoDB Connector for

Hadoop

12

Example – Fraud Detection

Payments

• Fraud modeling

Nightly Analysis

MongoDB Connector for

Hadoop

Results Cache

• Online payments processing

3rd Party Data Sources

Fraud Detection

queryonly

query only

13

Customer example – Global Travel Firm

Travel

• Flights, hotels and cars

• Real-time offers• User profiles,

reviews• User metadata

(previous purchases, clicks, views)

• User segmentation• Offer recommendation

engine• Ad serving engine• Bundling engine

Algorithms

MongoDB Connector for

Hadoop

14

Customer example – MetLife

Insurance

• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn

detection

• Customer action analysis

• Churn prediction algorithms

Churn Analysis

MongoDB Connector for

Hadoop

15

Customer example – Criteo

Ad-Serving

• Catalogs and products

• User profiles• Clicks• Views• Transactions

• User segmentation• Recommendation

engine• Prediction engine

Algorithms

MongoDB Connector for

Hadoop

16

• Java Map-Reduce, Stream Map-Reduce, Pig, & Hive access to MongoDB– MongoDB as input

• mongo.job.input.format=com.hadoop.MongoInputFormat• mongo.input.uri=mongodb://my-db:27017/db1.collection1

– MongoDB as output• mongo.job.output.format=com.hadoop.MongoOutputFormat• mongo.input.uri=mongodb://my-db:27017/db1.collection2

– Using MongoDB backup files• mongo.job.output.format=com.hadoop.BSONFileOutputFormat• mapred.output.dir=file:///results.bson

– Xxx

What is MongoDB-Hadoop Connector?

17

• Version 1.1.0, July 2013

– Pig support

– Hive support

– Streaming support

– Read/Write MongoDB backups

– Update writes

– Much more….

Enhancing MongoDB-Hadoop Connector

• Version 1.2.0, December 2013

– Apache Hadoop 2.2 support

– Multiple collections as M-R

source

– Multiple mongos support

– Custom splitting support

– Performance improvements

18

• Rich query language

• Native secondary indexes

• Geospatial indexes & search

• Text indexes & search

• Aggregation framework

• Javascript Map-Reduce

• Client-side analytics

MongoDB Native Analytics

19

Resources

White paper: Big Data: Examples and Guidelines for the Enterprise Decision Maker

http://www.mongodb.com/lp/whitepaper/big-data-nosql

Recorded Webinar Series: Thrive with Big Data

http://www.mongodb.com/lp/big-data-series

Recorded Webinar: What’s New with MongoDB Hadoop Integration

http://www.mongodb.com/presentations/webinar-whats-new-mongodb-hadoop-integration Documentation: MongoDB Connector for

Hadoophttp://docs.mongodb.org/ecosystem/tools/hadoop/

Trouble Tickets http://jira.mongodb.org (project = Hadoop Integration)

Subscriptions, support, consulting, training https://www.mongodb.com/products/how-to-buy

Resource Location

top related