Download - Why Spark on Hadoop Matters
© 2014 MapR Technologies 1© 2014 MapR Technologies
Why Spark on Hadoop Matters
MC Srivas, CTO and Founder, MapR TechnologiesApache Spark Summit - July 1, 2014
© 2014 MapR Technologies 2
MapR Overview
Top Ranked Exponential Growth
500+ Customers Cloud Leaders
3X bookings Q1 ‘13 – Q1 ‘14
80% of accounts expand 3X
90% software licenses
< 1% lifetime churn
> $1B in incremental revenuegenerated by 1 customer
© 2014 MapR Technologies 3
Rapidly Evolving LandscapeM
anag
emen
t
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEMSecurity
YARN
PigCascading
Spark
Batch
Spark Streaming
Storm*
Streaming
HBaseSolr
NoSQL & Search
Juju
Provision
Savannah*
MahoutMLLib
ML, Graph
GraphX
MR v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow &
Data Gov.Tez*
Accumulo*
HiveImpalaSharkDrill*
SQL
Sentry* Oozie ZooKeeperSqoopKnox* WhirrFalcon*Flume
Data Integrtn.& Access
HttpFSHue
* 2014 TIMELINE
© 2014 MapR Technologies 4
The Complete Spark Stack on HadoopM
anag
emen
t
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEMSecurity
YARN
Pig
Cascading
Spark
Batch
Spark Streaming
Storm*
Streaming
HBase
Solr
NoSQL & Search
Juju
Provision
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MR v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow &
Data Gov.Tez*
Accumulo*
Hive
Impala
SharkDrill*
SQL
Sentry* Oozie ZooKeeperSqoopKnox* WhirrFalcon*Flume
Data Integrtn.& Access
HttpFSHue
* 2014 TIMELINE
© 2014 MapR Technologies 5
A Winning Combination
© 2014 MapR Technologies 6
Spark Advantages:
IN-MEMORY PERFORMANCE
EASE OF DEVELOPMENT
COMBINE WORKFLOWS
• Easier APIs• Python, Scala, Java
• RDDs• DAGs Unify Processing
• Shark, ML, Streaming, GraphX
© 2014 MapR Technologies 7
Hadoop Advantages:
UNLIMITEDSCALE
WIDE RANGE OF APPLICATIONS
ENTERPRISE PLATFORM
• Multiple data sources• Multiple applications• Multiple users
• Reliability• Multi-tenancy• Security
• Files• Databases• Semi-structured
© 2014 MapR Technologies 8
The Combination of Spark on Hadoop
IN-MEMORY PERFORMANCE
EASE OF DEVELOPMENT
COMBINE WORKFLOWS
UNLIMITEDSCALE
WIDE RANGE OF APPLICATIONS
ENTERPRISE PLATFORM
Operational ApplicationsAugmented by In-Memory Performance
© 2014 MapR Technologies 9© 2014 MapR Technologies
Case Studies
© 2014 MapR Technologies 10
Industry Leading Ad-Targeting Platform
• High performance analytics over MapR M7 NoSQL
• Load from M7 table into RDD to augment scoring in real-time
• Results fed back to M7 for other applications
© 2014 MapR Technologies 11
Leading Pharma Company: NextGen Genomics
Existing process takes several weeks to align chemical compounds with genes
ADAM on Spark allows
realignment in a few hours
Geneticists can minimize engineering dependency
© 2014 MapR Technologies 12
Cisco: Security Intelligence Operations
Sensor data lands in M7
Spark Streaming on M7 for first check on known threats
Data next processed on GraphX and Mahout
Results queried using SQL via Shark and Impala
© 2014 MapR Technologies 13
Insurance Giant: Addressing Health Care Regulations
Patient information in M7 combined with clinical records to compute re-admittance probability
Process uses Spark with transactional data in M7
Insurance options decided in real-time on online portals
© 2014 MapR Technologies 14© 2014 MapR Technologies
In Summary
© 2014 MapR Technologies 15
Spark on
Hadoop gains traction for Real-time applications
© 2014 MapR Technologies 16
Pick the Right Tool for the Job
© 2014 MapR Technologies 17
MapR is Unbiased Open Source (a la Linux)• Open source distribution is about providing choice
– Linux includes MySQL, PostgreSQL and SQLite– Linux includes Apache httpd, nginx and Lighttpd
MapR Distribution for Hadoop Distribution C Distribution H
Spark Spark (all of it) and Shark Spark only No
Interactive SQL Shark, Impala, Drill, Hive/Tez One option(Impala)
One option(Hive/Tez)
Versions Hive 0.10, 0.11, 0.12, 0.13Pig 0.11, 012HBase 0.94, 0.98
One version One version
© 2014 MapR Technologies 18
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies
Thank you