mongodb & hadoop - understanding your big data

37
Hadoop & MongoDB Understanding your Big Data

Upload: mongodb

Post on 06-May-2015

1.974 views

Category:

Technology


0 download

DESCRIPTION

Big Data is the evolution of supercomputing for commercial enterprise and governments. Originally the domain of companies operating at Internet scale, today Big Data connects organizations of all sizes with discovery about their patterns, and insights into their business. But understanding the differences between the plethora of new technologies can be daunting. Graph / columnar / key value store / document are all called NoSQL, but which is best? How does Hadoop play in this ecosystem - its low cost and high efficiency have made it very popular, but how does it fit? In this webinar, we will explore: The full spectrum of Big Data Hadoop and MongoDB: friends or frenemies? Differences between Systems of Record and Systems of Engagement MongoDB customer examples of Systems of Engagement

TRANSCRIPT

Page 1: MongoDB & Hadoop - Understanding Your Big Data

Hadoop & MongoDB Understanding your Big Data

Page 2: MongoDB & Hadoop - Understanding Your Big Data

2

MongoDB World

Page 3: MongoDB & Hadoop - Understanding Your Big Data

3

Speakers

Jnan DashSenior [email protected]

Kelly StirmanDirector of [email protected]

Page 4: MongoDB & Hadoop - Understanding Your Big Data

4

• Last 12 years (2002-Now) - Executive Consultant, on the board and advisory board of several new software companies including Big Data players such as MongoDB

• 10 Years (1992-2002) – Oracle, Group Vice President, Systems Architecture and Technology, responsible for the server product planning and rollout

• 16 years (1975-1992) – IBM, Planner, architect, and development manager for DB2 product line at Silicon Valley Lab and Austin Lab. Head of IBM’s Database architecture, strategy, and technology

Jnan Dash

Page 5: MongoDB & Hadoop - Understanding Your Big Data

5

• Finally, some real innovation in DBMS

• MongoDB momentum is unprecedented!

• The changing landscape needs MongoDB– “Internet scale” distributed operations + highly flexible

data model for agile development + open source

• Perfect fit for cloud, mobility, and big data

Why am I excited about MongoDB?

Page 6: MongoDB & Hadoop - Understanding Your Big Data

6

• Big Data - Observations

• Evolution of Database Technology

• Hadoop+MongoDB

• Customer Examples

• Roadmap

• Summary

Agenda

Page 7: MongoDB & Hadoop - Understanding Your Big Data

7

1. Thousand years ago – Experimental ScienceDescription of natural phenomenon

2. Last few hundred years – Theoretical ScienceNewton’s Laws, Maxwell’s Equation,..

3. Last few decades – Computational ScienceSimulation of complex phenomena

4. Today – Data-intensive ScienceScientists overwhelmed with data deluge

Unify theory, experiment & simulation

The Fourth Paradigm

Page 8: MongoDB & Hadoop - Understanding Your Big Data

8

Internet Scale Commercial Supercomputing

• Originated with companies operating at Internet scale (to process ever increasing #users and data)

– Yahoo in the 1990s, then Google, Facebook, Twitter

– They needed to do it quickly, economically, and affordably at scale

• Hadoop is the first commercial supercomputing software platform

– Works at scale, affordable at scale

• HPC was used for meteorology and engineering scientific super computing. Big data is commercial equivalent of HPC

– Less about equations, more about discovery, patterns

• Many technologies have been around for decades• Clustering• Parallel processing• Distributed file systems

Page 9: MongoDB & Hadoop - Understanding Your Big Data

9

Big Data: 3V’s

Page 10: MongoDB & Hadoop - Understanding Your Big Data

10

Some Make it 4V’s

Page 11: MongoDB & Hadoop - Understanding Your Big Data

11

What’s driving Big Data

- Ad-hoc querying and reporting- Data mining techniques- Structured data, typical sources- Small to mid-size datasets

- Optimizations and predictive analytics- Complex statistical analysis- All types of data, and many sources- Very large datasets- More of a real-time

Page 12: MongoDB & Hadoop - Understanding Your Big Data

12

Big Data – the full spectrum

Transaction Processing

Analytical Processing

Data Mining, Visualization,

and Integration

Tools

RDBMS OLAP/DW

DW Appliance

Hadoop, Impala,..

NoSQL

NewSQL, In-

Memory, Stream...

Online/Realtime

Offline/Batch

Page 13: MongoDB & Hadoop - Understanding Your Big Data

13

Hadoop Ecosystem

Programming Languages

Computation

Object Storage

Zoo

keep

er

(Coo

rdin

atio

n)

Core Apache Hadoop Related Apache Projects

HDFS (Hadoop Distributed File System)

MapReduce(Distributed Programing Framework)

Hive(SQL)

Pig(Data Flow)

HBase(Wide Column Storage)

HCatalog(Meta Data)

HM

S(M

anag

emen

t)

Table Storage

Page 14: MongoDB & Hadoop - Understanding Your Big Data

Database Technology Evolution

Page 15: MongoDB & Hadoop - Understanding Your Big Data

15

Data Management over the years

1960’s

File Systems

1970’s

1st Generation DBMS

Data asShared Resource

1980’s

Relational Technology

Ease of Query

1990’s

New data types

OLAP/DW

Web Support

Unstructured Data

2005+ Big Data

Post-PC, Data Deluge, 3Vs,

NoSQL

Page 16: MongoDB & Hadoop - Understanding Your Big Data

16

Operational vs. Analytics

2010

RDBMS

Key-Value/Wide-column

OLAP/DW

Hadoop

2000

RDBMS

OLAP/DW

1990

RDBMS

Operational Database

Data warehouse

Document DB

NoSQL

Page 17: MongoDB & Hadoop - Understanding Your Big Data

17

MongoDB Features

• JSON Document Model with Dynamic Schemas

• Auto-Sharding for Horizontal Scalability

• Text Search

• Aggregation Framework and MapReduce

• Full, Flexible Index Support and Rich Queries

• Native Replication for High Availability

• Advanced Security

• Large Media Storage with GridFS

Page 18: MongoDB & Hadoop - Understanding Your Big Data

18

Documents are Rich Data Structures

{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Geo-

Coordinate

s

Page 19: MongoDB & Hadoop - Understanding Your Big Data

19

Machine Generated Data

Page 20: MongoDB & Hadoop - Understanding Your Big Data

20

• Hundreds of thousands of records per second

• Fast response required

• Sometimes all data kept, sometimes just summary

• Horizontal scalability required

Fast Moving Data

Page 21: MongoDB & Hadoop - Understanding Your Big Data

21

• A machine generates a specific kind of data

• The data model is unlikely to change

• But there are so many different machines…

• Queryability across all types

Data is Structured, but Varied…

Page 22: MongoDB & Hadoop - Understanding Your Big Data

22

• Event data written multiple times per second, minute, or hour

• Tracking progression of metrics over time

Time Series Data

Page 23: MongoDB & Hadoop - Understanding Your Big Data

23

Do More With Your Data

MongoDBRich Queries

• Find Paul’s cars• Find everybody in London with a car

built between 1970 and 1980

Geospatial• Find all of the car owners within 5km of

Trafalgar Sq.

Text Search• Find all the cars described as having

leather seats

Aggregation• Calculate the average value of Paul’s

car collection

Map Reduce• What is the ownership pattern of colors

by geography over time? (is purple trending up in China?)

{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [51.524,-0.087], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Page 24: MongoDB & Hadoop - Understanding Your Big Data

Hadoop & MongoDB

Page 25: MongoDB & Hadoop - Understanding Your Big Data

25

Enterprise Big Data Stack

EDWHadoop

Man

agem

ent

& M

on

ito

rin

gS

ecurity &

Au

ditin

g

RDBMS

CRM, ERP, Collaboration, Mobile, BI

OS & Virtualization, Compute, Storage, Network

RDBMS

Applications

Infrastructure

Data Management

Online Data Offline Data

Page 26: MongoDB & Hadoop - Understanding Your Big Data

26

MongoDB & Hadoop

• Multi-source analytics• Interactive & Batch• Data lake

• Online, Real-time• High concurrency &

HA• Live analytics

Operational Analytical

MongoDB Connector for

Hadoop

Page 27: MongoDB & Hadoop - Understanding Your Big Data

27

Hadoop Is Good for…

Risk Modeling Churn AnalysisRecommendation

Modeling

Ad TargetingTransaction

AnalysisTrade

Surveillance

Network Failure Prediction

Search Quality Data Lake

Page 28: MongoDB & Hadoop - Understanding Your Big Data

28

MongoDB Is Good for…

Single View Mobile Apps Fraud Detection

Customer Data Management

Content Management &

Delivery

Database-as-a- Service

Product & Asset Catalogs

Internet of Things

Social & Collaboration

Page 29: MongoDB & Hadoop - Understanding Your Big Data

Customer Examples

Page 30: MongoDB & Hadoop - Understanding Your Big Data

30

Many more examples

Big Data Product & Asset Catalogs

Security & Fraud

Internet of Things

Database-as-a-Service

Mobile Apps

Customer Data Management

Single View

Social & Collaboration

Content Management

Intelligence Agencies

Top Investment and Retail Banks

Top US Retailer

Top Global Shipping Company

Top Industrial Equipment Manufacturer

Top Media Company

Top Investment and Retail Banks

Page 31: MongoDB & Hadoop - Understanding Your Big Data

31

MongoDB Enterprise Value

Page 32: MongoDB & Hadoop - Understanding Your Big Data

32

• Makes MongoDB a Hadoop-enabled file system

• Full use of MongoDB’s indexes

• Read and write to live data, in-place

• Copy data between Hadoop and MongoDB

• Full support for data processing

– Hive

– MapReduce

– Pig

– Streaming

– EMR

MongoDB+Hadoop Connector

MongoDB Connector for

Hadoop

Page 33: MongoDB & Hadoop - Understanding Your Big Data

33

Customer Example – MetLife

Customer Service

• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn

detection

• Customer action analysis

• Churn prediction algorithms

Churn Analysis

MongoDB Connector for

Hadoop

Page 34: MongoDB & Hadoop - Understanding Your Big Data

34

Customer Example - eCommerce

Travel

• Flights, hotels and cars

• Real-time offers• User profiles, reviews• User metadata

(previous purchases, clicks, views)

• User segmentation• Offer recommendation

engine• Ad serving engine• Bundling engine

Algorithms

MongoDB Connector for

Hadoop

Page 35: MongoDB & Hadoop - Understanding Your Big Data

35

Roadmap

Capability Today Soon

Connectivity CustomCentralized Administration

MongoDB Hadoop Dynamic reads Automated Snapshots

BSON Support MapReduce, Hive, Pig Impala, Tez, Spark

Hadoop MongoDB Dynamic writes Bulk Loader

Page 36: MongoDB & Hadoop - Understanding Your Big Data

36

• Big Data covers a wide spectrum– Volume, Velocity, Variety– Hence the mythical equation Big Data = Hadoop

• Enterprises are more concerned about Variety– MongoDB provides the best platform

• Hadoop and MongoDB are complimentary– MongoDB for operational workloads– Hadoop for analytical workloads

Summary

Page 37: MongoDB & Hadoop - Understanding Your Big Data