spark in the enterprise - 2 years later by alan saldich
Post on 15-Feb-2017
1.359 Views
Preview:
TRANSCRIPT
1© Cloudera, Inc. All rights reserved.
Spark in the Enterprise – 2 Years LaterAlan Saldich – Vice President, Marketing
2© Cloudera, Inc. All rights reserved.
A busy 2 years for Cloudera & Apache Spark
2013 2014 2015 2016
Announced support for Spark
Shipped with CDH 4.4
Spark on YARN integration
Announces initiative to make Spark the standard execution engine
Launches first Spark training
Added Kerberos integration
Cloudera engineers publish O’Reilly Spark book
3© Cloudera, Inc. All rights reserved.
Recent engineering contributionsIntegration with Hadoop
Ecosystem Production-Ready Features Ongoing Initiatives
• Spark-on-YARN integration• Dynamic Resource
Allocation• Kafka Integration• HBase Integration• Fixed operational issues at
scale
• Security• Kerberos Integration• HDFS Sync (Sentry)
• Governance• Cloudera Navigator
integration (audit & lineage)• Monitoring/
Troubleshooting• Improved debugging
• Zero Data Loss• Spark Streaming Resilience
• Standard Execution Engine• Hive on Spark • Pig on Spark• Crunch on Spark• Solr indexing on Spark
4© Cloudera, Inc. All rights reserved.
2 years, 200+ customers
5© Cloudera, Inc. All rights reserved.
What are they doing with Spark?
Hive
Hbase
Impala
Solr
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Batch ETLPredictive
Machine Learning
MPI Alternativ
e
Stream processin
g
Commonly CoinstalledMost Popular Use Cases
6© Cloudera, Inc. All rights reserved.
What are they asking for?• Security•At a minimum equivalent to market leading RDBMS
•Performance•At least as fast as the systems I’m familiar with today
• Simplicity•All the functionality I need to build my application but not more
7© Cloudera, Inc. All rights reserved.
Current Security Architecture: Inconsistency = Limited Access
Policy B
Impala(column-level)
Policy A
Impala
...than others.Some engines supportmore granular restrictions...
Unified, GranularPolicy Enforcement
A new high-performance security layer that centrally enforces access control policy. Complementing Apache Sentry, which provides unified policy definition, it delivers unified row- and column-based security, and dynamic data masking, to every Hadoop access path.Benefits:
● Security: Fine-grained permissions and enforcement across Hadoop, building on Sentry.● Interoperability: Developers don’t need to be aware of on-disk formats; transparently swap
components.
RecordService: Unified Authorization Enforcement
Spark(file-level)
RecordService(policy enforcement)
Spark
Sentry(policy definition)
Sentry(policy definition)
MR
8© Cloudera, Inc. All rights reserved. 8
Kudu: Fast Analytics on Fast Changing Data
Fast Scans, Analyticsand Processing of
Stored Data
Fast On-Line Updates &
Data Serving
Unchanging
Fast ChangingFrequent Updates
HDFS
HBase
Arbitrary Storage(Active Archive)
Append-Only
Fast Analytics(on fast-changing or
frequently-updated data)
Real-Time
Kudu Kudu fills the GapModern analytic
applications often require complex data
flow & difficult integration work to move data between
HBase & HDFS
Analytic Gap
Pace of Analysis
Pace
of D
ata
9© Cloudera, Inc. All rights reserved.
In conclusion
• Spark in the enterprise => we’re well on our way
•Cloudera in the community => we’re doing our part
• The applications you can build => will only get more powerful, more valuable
10© Cloudera, Inc. All rights reserved.
Thank You
top related