meruvian - introduction to mapr
DESCRIPTION
Meruvian - Introduction to MapRTRANSCRIPT
®© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Frans Thamura / Meruvian / [email protected] March 2014
®© 2014 MapR Technologies 2
MapR Overview
BIG DATA
BEST PRODUCT
BUSINESS IMPACT
Hadoop Top Ranked
Production Success
®© 2014 MapR Technologies 3 © 2014 MapR Technologies ®
3 Trends Forcing a revolution in enterprise architecture
®© 2014 MapR Technologies 4
Industry Leaders Compete and Win with Data 1 TREND
More Data Beats Better Algorithms Collecting interaction data from ecommerce, social media, offline, and call centers enables a “customer 360 view” and consumer intimacy Competitive Advantage is Decided by 0.5% Consumer financial services: 1% improvement in fraud detection means hundreds of millions of dollars Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability
®© 2014 MapR Technologies 5
Big Data is Overwhelming Traditional Systems
• Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery
• Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery
Enterprise Data
Architecture
2 TREND
ENTERPRISE USERS
OPERATIONAL SYSTEMS
ANALYTICAL SYSTEMS
PRODUCTION REQUIREMENTS
PRODUCTION REQUIREMENTS
OUTSIDE SOURCES
®© 2014 MapR Technologies 6
Hadoop: The Disruptive Technology at the Core of Big Data 3 TREND
JOB TRENDS FROM INDEED.COM
Inte
res
t O
ve
r T
ime
2 0 0 4 2 0 0 6 2 0 0 8 2 0 1 0 2 0 1 2 2 0 1 4
GOOGLE TRENDS
®© 2014 MapR Technologies 7 © 2014 MapR Technologies ®
And 3 Realities
®© 2014 MapR Technologies 8
OPERATIONAL SYSTEMS
ANALYTICAL SYSTEMS
ENTERPRISE USERS
1 REALITY
• Data staging • Archive
• Data transformation • Data exploration
• Streaming, interactions
Hadoop Relieves the Pressure from Enterprise Systems
2 Interoperability
1 Reliability and DR
4 Supports operations and analytics
3 High performance
Keys for Production Success
®© 2014 MapR Technologies 9
What Would Google Do?
2003 GFS
2004 Web index is batch (GFS/MapReduce)
2010 Web index is real-time
(BigTable)
The transition from batch to real-time
2004 MapReduce
2006 BigTable
The explosion in operational applications
Google’s operational data store (BigTable) has enabled multiple revolutions within the company:
(1)
(2)
2 REALITY
®© 2014 MapR Technologies 10
Architecture Matters for Success 3 REALITY
FOUNDATION
®© 2014 MapR Technologies 11
FOUNDATION
Architecture Matters for Success 3 REALITY
Data protection & security
High performance
Multi-tenancy
Operational & Analytical Workloads
Open standards for integration
NEW APPLICATIONS SLAs TRUSTED INFORMATION LOWER TCO
®© 2014 MapR Technologies 12 © 2014 MapR Technologies ®
MapR: Architecture Matters
®© 2014 MapR Technologies 13
104M CARD MEMBERS
Fortune 100 Financial Services Company
®© 2014 MapR Technologies 14
Advertising Automation
Cloud!
Sellers Cloud!
Buyers!Cloud!
100B AD AUCTIONS
per day
®© 2014 MapR Technologies 15
45M SHOPPERS
analyzed each month
Fortune 100 Retailer
®© 2014 MapR Technologies 16
20M SONGS
®© 2014 MapR Technologies 17
Largest Biometric Database in the World
PEOPLE
1.3B PEOPLE
®© 2014 MapR Technologies 18
ENTERPRISE DATA HUB
MARKETING OPTIMIZATION
RISK & SECURITY OPTIMIZATION
OPERATIONAL INTELLIGENCE
• Multi-structured data staging & archive
• ETL / DW optimization • Mainframe optimization
• Data exploration
• Recommendation engines & targeting
• Customer 360 • Click-stream analysis • Social media analysis • Ad optimization
• Network security monitoring
• Security information & event management
• Fraudulent behavioral analysis
• Supply chain & logistics • System log analysis • Manufacturing quality assurance
• Preventative maintenance
• Smart meter analysis
Common Use Cases: Taking Advantage of Hadoop
®© 2014 MapR Technologies 19
MapR is the Hadoop Technology Leader
BIG DATA HADOOP
®© 2014 MapR Technologies 20
The Power of the Open Source Community M
anag
emen
t
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark Streaming
Storm*
Streaming
HBase
Solr
NoSQL & Search
Juju
Provisioning &
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance Tez*
Accumulo*
Hive
Impala
Shark
Drill
SQL
Sentry* Oozie ZooKeeper Sqoop
Knox* Falcon* Flume
Data Integration & Access
HttpFS
Hue
* Cer&fica&on/support planned for 2014
MapR-DB MapR-FS
®© 2014 MapR Technologies 21
MapR Distribution for Hadoop M
anag
emen
t
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark Streaming
Storm*
Streaming
HBase
Solr
NoSQL & Search
Juju
Provisioning &
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance Tez*
Accumulo*
Hive
Impala
Shark
Drill
SQL
Sentry* Oozie ZooKeeper Sqoop
Knox* Falcon* Flume
Data Integration & Access
HttpFS
Hue
* Cer&fica&on/support planned for 2014
Enterprise-grade Security Operational Performance Multi-tenancy Interoperability
MapR-DB MapR-FS
• Standard file access • Standard database
access • Pluggable services • Broad developer
support
• Enterprise security authorization
• Wire-level authentication
• Data governance
• Ability to support predictive analytics, real-time database operations, and support high arrival rate data
• Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators
• 2X to 7X higher performance
• Consistent, low latency
• High availability • Data protection • Disaster recovery
®© 2014 MapR Technologies 22
MapR Distribution for Hadoop M
anag
emen
t
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark Streaming
Storm*
Streaming
HBase
Solr
NoSQL & Search
Juju
Provisioning &
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeper Sqoop
Knox* Whirr Falcon* Flume
Data Integration & Access
HttpFS
Hue
* Cer&fica&on/support planned for 2014
• Enterprise security authorization
• Wire-level authentication
• Data governance Ø Kerberos support Ø Native key-based
authentication Ø Enterprise directory
integration LDAP/NIS/AD
Ø Linux PAM Ø Role-based access
control with Boolean expressions
Ø Intel AES/NI high performance encryption
• Ability to support predictive analytics, real-time database operations, and support high arrival rate data
Ø Integrated
in-Hadoop database Ø Consistent low
latency Ø Instant recovery for
database operations Ø No compactions Ø Elimination of read/
write amplification Ø Zero administration
• Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators
Ø Data placement
control Ø Job placement
control Ø Logical volumes Ø Ability to leverage
enterprise access control to isolate and secure data access
Ø Enforce SLAs, provide job isolation
• High availability • Data protection • Disaster recovery Ø Instant stateful
failover Ø 99.999% Availability Ø Consistent snapshots Ø Point-in-time recovery Ø Self-healing Ø WAN replication Ø RTO with mirroring Ø Job Tracker HA Ø System resource
protection Ø Job isolation and user
quotas
• Standard file access • Standard database
access • Pluggable services • Broad developer
support
Ø NFS support Ø POSIX Ø Random read/write Ø Concurrent read/write Ø JDBC/ODBC Ø Nagios/Gangila
integration Ø REST API
• 2X to 7X higher performance
• Consistent , low latency
Ø No-Namenode
distributed architecture
Ø Database performance with no compactions or defragmentation
Ø Automated compression
Enterprise-grade Security Operational Performance Multi-tenancy Interoperability
®© 2014 MapR Technologies 23
MapR: Best Solution for Customer Success
Top Ranked Exponential Growth
500+ Customers
Premier Investors
>2x annual bookings
80% of accounts expand 3X
90% software licenses
< 1% lifetime churn
> $1B in incremental revenue generated by 1 customer
®© 2014 MapR Technologies 24
Forrester Wave™: Big Data Hadoop Solutions, Q1‘14
The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
MapR: The Top Ranked Current Offering
“The score speaks for itself. MapR has added some unique innovations to its Hadoop distribution, including support for Network File System (NFS), running arbitrary code in the cluster, performance enhancements for HBase, as well as high-availability and disaster recovery features.”
Weak
Weak
Strategy Strong
Current offerings
Strong
Risky Bets Contenders
Strong Performers Leaders
Market presence
®© 2014 MapR Technologies 25
Forrester Wave™: Big Data Hadoop Solutions, Q1‘14
The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
MapR: The Top Ranked Current Offering
“The score speaks for itself. MapR has added some unique innovations to its Hadoop distribution, including support for Network File System (NFS), running arbitrary code in the cluster, performance enhancements for HBase, as well as high-availability and disaster recovery features.”
Weak
Weak
Strategy Strong
Current offerings
Strong
Risky Bets Contenders
Strong Performers Leaders
Market presence
®© 2014 MapR Technologies 26 © 2014 MapR Technologies ®
High Availability & Data Protection
®© 2014 MapR Technologies 27
Business Continuity
High Availability
Data Protection
Disaster Recovery
What are your requirements?
What do you have for your enterprise storage, databases and data warehouses?
®© 2014 MapR Technologies 28
No NameNode architecture
MapReduce/YARN HA
NFS HA
Instant recovery
Rolling upgrades
HA is built in
• Distributed metadata can self-heal • No practical limit on # of files
• Jobs are not impacted by failures • Meet your data processing SLAs
• High throughput and resilience for NFS-based data ingestion, import/export and multi-client access
• Files and tables are accessible within seconds of a node failure or cluster restart
• Upgrade the software with no downtime
• No special configuration to enable HA • All MapR customers operate with HA
High Availability (HA) Everywhere
®© 2014 MapR Technologies 29
Apache Hadoop NameNode High Availability
NameNode
A B C D E F
HDFS-based Distributions
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
Primary NameNode
A B C D E F
Standby NameNode
A B C D E F
NameNode
A B
NameNode
C D
NameNode
E F NameNode
A B
NameNode
C D
NameNode
E F
HDFS HA HDFS Federation
Single point of failure
Limited to 50-200 million files
Performance bottleneck
Metadata must fit in memory
Only one active NameNode
Limited to 50-200 million files
Performance bottleneck
Metadata must fit in memory
Double the block reports
Multiple single points of failure w/o HA
Needs 20 NameNodes for 1 Billion files
Performance bottleneck
Metadata must fit in memory
Double the block reports
®© 2014 MapR Technologies 30
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
No-NameNode Architecture
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
®
NameNode
A B C D E F A A A B B B B C C C D D D E E E F F F
Up to 1T files (> 5000x advantage) Significantly less hardware & OpEx Higher performance
No special config to enable HA Automatic failover & re-replication Metadata is persisted to disk
®© 2014 MapR Technologies 31
Data Protection: Replication and Snapshots
Replication • Protect from hardware failures • File chunks, table regions and metadata are automatically
replicated (3x by default) • At least one replica on a different rack
Snapshots • Protect from user and application errors • Point-in-time recovery • Redirect on write • No performance or scale impact • Read files and tables directly from snapshot
C1 C2
C3
C1 C2
C4
C1 C4 C4 C2
C5
C5 C6
C3
C5 C6
C3C6 C7
C7 C7
Ac#ve&Volume Snapshot13505505.09500
A B C D D₁
®© 2014 MapR Technologies 32
Disaster Recovery: Mirroring • Flexible
– Choose the volumes/directories to mirror – You don’t need to mirror the entire cluster – Active/active
• Fast – No performance impact – Block-level (8KB) deltas – Automatic compression
• Safe – Point-in-time consistency – End-to-end checksums
• Easy – Graceful handling of network issues – No third-party software – Takes less than two minutes to configure! Production
WAN
Production Research
Datacenter 1 Datacenter 2
WAN EC2
®© 2014 MapR Technologies 33 © 2014 MapR Technologies ®
Interoperability
®© 2014 MapR Technologies 34
Seamless Integration with Direct Access NFS • MapR is POSIX compliant
– Random reads/writes – Simultaneous reading and writing to a file – Compression is automatic and transparent
• Industry-standard NFS interface (in addition to HDFS API)
– Stream data into the cluster – Leverage thousands of tools and
applications – Easier to use non-Java programming
languages – No need for most proprietary Hadoop
connectors
®
®© 2014 MapR Technologies 35
When Hadoop Looks Like a NAS…
• Data ingestion is easy – Popular online gaming company changed data
ingestion from a complex Flume cluster to a 17-line Python script
• Database bulk import/export with standard vendor tools
– Large telco saved $30M on EDW costs (5 years) by leveraging MapR to pre-process and store raw data prior to loading into EDW
• 1000s of applications/tools – Large credit card company uses MapR volumes as
the user home directories on the Hadoop gateway servers
Application servers
$ find . | grep log $ cp $ vi results.csv $ scp $ tail -‐f part-‐00000
Logs
®© 2014 MapR Technologies 36 © 2014 MapR Technologies ®
Multi-Tenancy & Security
®© 2014 MapR Technologies 37
Volumes
100K volumes are OK, create as many as needed
Volumes dramatically simplify management: • Replication factor • Scheduled mirroring • Scheduled snapshots • Data placement control • User access and tracking • Administrative permissions
/projects
/tahoe
/yosemite
/user
/msmith
/bjohnson
®© 2014 MapR Technologies 38
Multi-tenancy Isolation • Tasks sandboxed so they don’t impact other tasks or system daemons • System resources protected from runaway jobs • Volume-based data placement • Label-based job scheduling
Quotas • Storage quotas by volume/user/group • CPU and memory quotas by queue/user/group
Security and delegation • Wire-level authentication and encryption (Kerberos not required) • Fine-grained administration permissions including volume-level delegation • Authenticate users to AD, LDAP and Kerberos via Linux PAM
Reporting • Detailed reporting on resource usage (75+ different metrics) • All reports are available via UI, CLI and REST API
®© 2014 MapR Technologies 39
MapR Integrates Security into Hadoop MapR Integrates Security into Hadoop
®© 2014 MapR Technologies 40
Making Security Easy
> 99% consumers accessing
online banks use strong wire-level authentication
< 5% organizations deploying Hadoop enable strong
wire-level authentication
®© 2014 MapR Technologies 41
Hadoop Security
Authorization to ensure the right access to files and databases
Authentication for users and user-created job requests
Encryption to ensure user credentials and data are always secure
Integration with existing security infrastructure
®© 2014 MapR Technologies 42
… Along With Fine-Grained Access Control
Full POSIX permissions on files and directories ACLs on tables, column families and columns ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes Access control expressions for easy, role-based control
®© 2014 MapR Technologies 43
HADOOP CLUSTER
CLIENT (NO KERBEROS)
CLIENT (KERBEROS-ENABLED)
KERBEROS KDC
USER DIRECTORY (AD, LDAP, NIS, …)
USERNAME/ PASSWORD
(HTTPS)
KERBEROS SERVICE TICKET
CHECK USERNAME/ PASSWORD
CHECK USERNAME/PASSWORD
Existing Security Infrastructure
Integration with Existing Security Infrastructure SSO with existing Kerberos infrastructure (optional) Linux PAM integration enables third-party user directories
MapR supports wire-level authentication with and without Kerberos
®© 2014 MapR Technologies 44
Native Security Authentication
*MapR Leverages Standard Cryptography: NSA Suite B Cryptography (AES-256 and SHA-384)
Ease of Deployment
Hadoop initiates and maintains secure key communication* throughout the cluster without requiring external validation Users authenticate themselves through a simple and secure login-password mechanism All cluster nodes authenticate and interact with each other through secure keys
Cluster-wide Security
All operations on Hadoop are secured natively including: User operations such as file reads and writes, database manipulations, MapReduce job submissions Intra-cluster node-node interactions including remote procedure calls Inter-cluster operations such as mirroring
®© 2014 MapR Technologies 45 © 2014 MapR Technologies ®
Performance Leader
®© 2014 MapR Technologies 46
World-Record Performance
PREVIOUS RECORD: 1.6 TB with 2200 nodes
1.65 TB IN 1 MINUTE
298 NODES
NEW MINUTESORT WORLD RECORD
MapR: With a Fraction of the Hardware
Previous Record
®© 2014 MapR Technologies 47
Comparative Study of Hadoop Distributions
212
59
262
69
276
64
475 465 IDH
CDH
HDP
MapR
Source: Flux7 Labs Study, October 2013
Read and Write Throughput Benchmarks
DFSIO Read Throughput DFSIO Write Throughput
MB
per
Sec
ond
MB
per
Sec
ond
®© 2014 MapR Technologies 48
MapR-DB: The Best In-Hadoop Database
▪ NoSQL Wide-‐column Store
▪ Apache HBase API ▪ Integrated with Hadoop
HBase
JVM
HDFS
JVM
ext3/ext4
Disks
Other Distros
Tables/Files
Disks
MapR Enterprise Database Edition (M7)
The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics
MapR-DB
®© 2014 MapR Technologies 49
Consistent, Low Latency
--- M7 Read Latency --- Others Read Latency
®© 2014 MapR Technologies 50
Operations + Analytics = Real-time, Personalized Services
Fraud model Recommendations table
MapR Distribution for Hadoop
Fraud investigator
Interactive marketer
Online transactions
Fraud detection
Personalized offers
Clickstream analysis
Fraud investigation tool
Real-time Operational Applications
Analytics
®© 2014 MapR Technologies 51 © 2014 MapR Technologies ®
Ensuring Your Success
®© 2014 MapR Technologies 52
®© 2014 MapR Technologies 53
Committed to our Customers’ Success
Educational Services Professional Services Customer Support
Core Hadoop Services
Data Engineering
Advanced Analytics
M7/HBase Practice
Hadoop engineering experts provide
24x7x365 global coverage
Instructor-led courses &
Web-based training for Hadoop cluster administration, HBase &
MapReduce programming and more
Data Engineering
Data Science
®© 2014 MapR Technologies 54
WORLDWIDE PRESENCE &
CUSTOMER SUPPORT
HQ
®© 2014 MapR Technologies 55
Key MapR Advantage Partners Business Services
INFRASTRUCTURE & CLOUD
ANALYTICS & BUSINESS INTELLIGENCE
APPLICATIONS & OS
CONSULTANTS & INTEGRATORS
DATA WAREHOUSE & INTEGRATION
®© 2014 MapR Technologies 56
From Redundant Processing Silos and Data Science Experiments…
Opportunity to Revolutionize Enterprise Data Architecture
®© 2014 MapR Technologies 57
®
… to Consolidated Operational and Analytical Workloads
The Production Enterprise Data Hub
Round bullets for subtext
®© 2014 MapR Technologies 58
Summary
BIG DATA
BEST PRODUCT
BUSINESS IMPACT
Hadoop Top Ranked
Production Success
®© 2014 MapR Technologies 59
Q & A
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies
®© 2014 MapR Technologies 60 © 2014 MapR Technologies ®
Extra slides
®© 2014 MapR Technologies 61
Packages Supported by various distributions MapR 4.0.1 (Sep 2014)
Cloudera 5.1.2 (Aug 2014)
Hortonworks 2.1.5 (Aug 2014)
Apache Versions (Sep 12th, 2014)
Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1
Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2 Hive 0.12, 0.13 0.12 0.13 0.13 Tez 0.4 (Dev Preview Only) X 0.4 0.5 Pig 0.12 0.12 0.12 0.12 Cascading 2.1.6 X X 2.5 Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Interactive SQL Impala 1.2.3 1.4 X 1.4 Drill 0.5 X X 0.5 SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1
NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98 Phoenix X X 4.0.0 4.1.0 AsyncHBase 1.5 X X 1.5 Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA
Machine Learning and Graph
Mahout 0.9 0.9 0.9 0.9 MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2 Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1
Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5 Flume 1.5.0 1.5.0 1.4.0 1.5.0 Knox X X 0.4 0.4
Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1 Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5
GUI, Configuration, Monitoring
Management MCS CM Ambari Ambari Hue 3.5 3.6 2.5.1 3.6
Red – lacking Blue - leading
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html
®© 2014 MapR Technologies 62
Business Continuity
High Availability
Data Protection
Disaster Recovery
What are your requirements?
What do you have for your enterprise storage, databases and data warehouses?
®© 2014 MapR Technologies 63
The Cloud Leaders Pick MapR
Google chose MapR to provide Hadoop on Google
Compute Engine
Amazon EMR is the largest Hadoop provider in revenue
and # of clusters