spark in the enterprise - 2 years later by alan saldich

10
1 © Cloudera, Inc. All rights reserved. Spark in the Enterprise – 2 Years Later Alan Saldich – Vice President, Marketing

Upload: spark-summit

Post on 15-Feb-2017

1.359 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Spark in the Enterprise - 2 Years Later by Alan Saldich

1© Cloudera, Inc. All rights reserved.

Spark in the Enterprise – 2 Years LaterAlan Saldich – Vice President, Marketing

Page 2: Spark in the Enterprise - 2 Years Later by Alan Saldich

2© Cloudera, Inc. All rights reserved.

A busy 2 years for Cloudera & Apache Spark

2013 2014 2015 2016

Announced support for Spark

Shipped with CDH 4.4

Spark on YARN integration

Announces initiative to make Spark the standard execution engine

Launches first Spark training

Added Kerberos integration

Cloudera engineers publish O’Reilly Spark book

Page 3: Spark in the Enterprise - 2 Years Later by Alan Saldich

3© Cloudera, Inc. All rights reserved.

Recent engineering contributionsIntegration with Hadoop

Ecosystem Production-Ready Features Ongoing Initiatives

• Spark-on-YARN integration• Dynamic Resource

Allocation• Kafka Integration• HBase Integration• Fixed operational issues at

scale

• Security• Kerberos Integration• HDFS Sync (Sentry)

• Governance• Cloudera Navigator

integration (audit & lineage)• Monitoring/

Troubleshooting• Improved debugging

• Zero Data Loss• Spark Streaming Resilience

• Standard Execution Engine• Hive on Spark • Pig on Spark• Crunch on Spark• Solr indexing on Spark

Page 4: Spark in the Enterprise - 2 Years Later by Alan Saldich

4© Cloudera, Inc. All rights reserved.

2 years, 200+ customers

Page 5: Spark in the Enterprise - 2 Years Later by Alan Saldich

5© Cloudera, Inc. All rights reserved.

What are they doing with Spark?

Hive

Hbase

Impala

Solr

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Batch ETLPredictive

Machine Learning

MPI Alternativ

e

Stream processin

g

Commonly CoinstalledMost Popular Use Cases

Page 6: Spark in the Enterprise - 2 Years Later by Alan Saldich

6© Cloudera, Inc. All rights reserved.

What are they asking for?• Security•At a minimum equivalent to market leading RDBMS

•Performance•At least as fast as the systems I’m familiar with today

• Simplicity•All the functionality I need to build my application but not more

Page 7: Spark in the Enterprise - 2 Years Later by Alan Saldich

7© Cloudera, Inc. All rights reserved.

Current Security Architecture: Inconsistency = Limited Access

Policy B

Impala(column-level)

Policy A

Impala

...than others.Some engines supportmore granular restrictions...

Unified, GranularPolicy Enforcement

A new high-performance security layer that centrally enforces access control policy. Complementing Apache Sentry, which provides unified policy definition, it delivers unified row- and column-based security, and dynamic data masking, to every Hadoop access path.Benefits:

● Security: Fine-grained permissions and enforcement across Hadoop, building on Sentry.● Interoperability: Developers don’t need to be aware of on-disk formats; transparently swap

components.

RecordService: Unified Authorization Enforcement

Spark(file-level)

RecordService(policy enforcement)

Spark

Sentry(policy definition)

Sentry(policy definition)

MR

Page 8: Spark in the Enterprise - 2 Years Later by Alan Saldich

8© Cloudera, Inc. All rights reserved. 8

Kudu: Fast Analytics on Fast Changing Data

Fast Scans, Analyticsand Processing of

Stored Data

Fast On-Line Updates &

Data Serving

Unchanging

Fast ChangingFrequent Updates

HDFS

HBase

Arbitrary Storage(Active Archive)

Append-Only

Fast Analytics(on fast-changing or

frequently-updated data)

Real-Time

Kudu Kudu fills the GapModern analytic

applications often require complex data

flow & difficult integration work to move data between

HBase & HDFS

Analytic Gap

Pace of Analysis

Pace

of D

ata

Page 9: Spark in the Enterprise - 2 Years Later by Alan Saldich

9© Cloudera, Inc. All rights reserved.

In conclusion

• Spark in the enterprise => we’re well on our way

•Cloudera in the community => we’re doing our part

• The applications you can build => will only get more powerful, more valuable

Page 10: Spark in the Enterprise - 2 Years Later by Alan Saldich

10© Cloudera, Inc. All rights reserved.

Thank You