spark in the enterprise - 2 years later by alan saldich

Post on 15-Feb-2017

1.359 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1© Cloudera, Inc. All rights reserved.

Spark in the Enterprise – 2 Years LaterAlan Saldich – Vice President, Marketing

2© Cloudera, Inc. All rights reserved.

A busy 2 years for Cloudera & Apache Spark

2013 2014 2015 2016

Announced support for Spark

Shipped with CDH 4.4

Spark on YARN integration

Announces initiative to make Spark the standard execution engine

Launches first Spark training

Added Kerberos integration

Cloudera engineers publish O’Reilly Spark book

3© Cloudera, Inc. All rights reserved.

Recent engineering contributionsIntegration with Hadoop

Ecosystem Production-Ready Features Ongoing Initiatives

• Spark-on-YARN integration• Dynamic Resource

Allocation• Kafka Integration• HBase Integration• Fixed operational issues at

scale

• Security• Kerberos Integration• HDFS Sync (Sentry)

• Governance• Cloudera Navigator

integration (audit & lineage)• Monitoring/

Troubleshooting• Improved debugging

• Zero Data Loss• Spark Streaming Resilience

• Standard Execution Engine• Hive on Spark • Pig on Spark• Crunch on Spark• Solr indexing on Spark

4© Cloudera, Inc. All rights reserved.

2 years, 200+ customers

5© Cloudera, Inc. All rights reserved.

What are they doing with Spark?

Hive

Hbase

Impala

Solr

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Batch ETLPredictive

Machine Learning

MPI Alternativ

e

Stream processin

g

Commonly CoinstalledMost Popular Use Cases

6© Cloudera, Inc. All rights reserved.

What are they asking for?• Security•At a minimum equivalent to market leading RDBMS

•Performance•At least as fast as the systems I’m familiar with today

• Simplicity•All the functionality I need to build my application but not more

7© Cloudera, Inc. All rights reserved.

Current Security Architecture: Inconsistency = Limited Access

Policy B

Impala(column-level)

Policy A

Impala

...than others.Some engines supportmore granular restrictions...

Unified, GranularPolicy Enforcement

A new high-performance security layer that centrally enforces access control policy. Complementing Apache Sentry, which provides unified policy definition, it delivers unified row- and column-based security, and dynamic data masking, to every Hadoop access path.Benefits:

● Security: Fine-grained permissions and enforcement across Hadoop, building on Sentry.● Interoperability: Developers don’t need to be aware of on-disk formats; transparently swap

components.

RecordService: Unified Authorization Enforcement

Spark(file-level)

RecordService(policy enforcement)

Spark

Sentry(policy definition)

Sentry(policy definition)

MR

8© Cloudera, Inc. All rights reserved. 8

Kudu: Fast Analytics on Fast Changing Data

Fast Scans, Analyticsand Processing of

Stored Data

Fast On-Line Updates &

Data Serving

Unchanging

Fast ChangingFrequent Updates

HDFS

HBase

Arbitrary Storage(Active Archive)

Append-Only

Fast Analytics(on fast-changing or

frequently-updated data)

Real-Time

Kudu Kudu fills the GapModern analytic

applications often require complex data

flow & difficult integration work to move data between

HBase & HDFS

Analytic Gap

Pace of Analysis

Pace

of D

ata

9© Cloudera, Inc. All rights reserved.

In conclusion

• Spark in the enterprise => we’re well on our way

•Cloudera in the community => we’re doing our part

• The applications you can build => will only get more powerful, more valuable

10© Cloudera, Inc. All rights reserved.

Thank You

top related