spark will replace hadoop ! know why

26
http://www.edureka.co/apache-spark-scala-training Spark will replace Hadoop ! Know Why ?

Upload: edureka

Post on 07-Aug-2015

466 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Spark Will Replace Hadoop ! Know Why

http://www.edureka.co/apache-spark-scala-training

Spark will replace Hadoop ! Know Why ?

Page 2: Spark Will Replace Hadoop ! Know Why

Slide 2Slide 2Slide 2 http://www.edureka.co/apache-spark-scala-training

At the end of the session, you will be able to:

Understand Why Learn Spark?

Know Advantages of Spark & its Survey for 2015

Discover Spark Career Path

Understand how Companies are using Spark?

Agenda

Page 3: Spark Will Replace Hadoop ! Know Why

Slide 3Slide 3Slide 3 http://www.edureka.co/apache-spark-scala-training

Why Spark?

Page 4: Spark Will Replace Hadoop ! Know Why

Slide 4Slide 4Slide 4 http://www.edureka.co/apache-spark-scala-training

Rise of Big Data

By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes (ZB)

The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on

Earth.

0

1000

2000

3000

4000

5000

6000

7000

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Unstructured Data

Structured Data Un-structured Data

Page 5: Spark Will Replace Hadoop ! Know Why

Slide 5Slide 5Slide 5 http://www.edureka.co/apache-spark-scala-training

Application of Big Data

Source: Twitter

Page 6: Spark Will Replace Hadoop ! Know Why

Slide 6Slide 6Slide 6 http://www.edureka.co/apache-spark-scala-training

Application of Big Data

Page 7: Spark Will Replace Hadoop ! Know Why

Slide 7Slide 7Slide 7 http://www.edureka.co/apache-spark-scala-training

Hadoop is not Enough!

Limitations:

Conclusion:

Real-time Processing

Not Fast Enough

Hadoop MapReduce is Limited to Batch Processing. Real-time processing was a big “No” in Hadoop

Hadoop MapReduce is fast but not fast enough

It is essential and can be achieved using Spark!

Page 8: Spark Will Replace Hadoop ! Know Why

Slide 8Slide 8Slide 8 http://www.edureka.co/apache-spark-scala-training

Spark Survey and its Advantages

Page 9: Spark Will Replace Hadoop ! Know Why

Slide 9Slide 9Slide 9 http://www.edureka.co/apache-spark-scala-training

Spark Survey 2015!

Source: Typesafe

Page 10: Spark Will Replace Hadoop ! Know Why

Slide 10Slide 10Slide 10 http://www.edureka.co/apache-spark-scala-training

Advantages of Spark

Ease of Use

Generality

Runs Everywhere

100x faster than MR

Page 11: Spark Will Replace Hadoop ! Know Why

Slide 11Slide 11Slide 11 http://www.edureka.co/apache-spark-scala-training

Feature Comparision

Fast 100x faster than MapReduce

Batch Processing Batch and Real-time Processing

Stores Data on Disk Stores Data in Memory

OpenSource OpenSource

Written in Java Written in Scala

Hadoop MapReduce HADOOP Spark

Source: Databrix

Page 12: Spark Will Replace Hadoop ! Know Why

Slide 12Slide 12Slide 12 http://www.edureka.co/apache-spark-scala-training

Spark Features/Modules in Demand

Source: Typesafe

Page 13: Spark Will Replace Hadoop ! Know Why

Slide 13Slide 13Slide 13 http://www.edureka.co/apache-spark-scala-training

New Features in 2015

Data Frames

• Similar API to data frames in R and Pandas• Automatically optimised via Spark SQL• Released in Spark 1.3

SparkR

• Released in Spark 1.4• Exposes DataFrames, RDD’s & ML library in R

Machine Learning Pipelines

• High Level API• Featurization• Evaluation • Model Tuning

External Data Sources

• Platform API to plug Data-Sources into Spark• Pushes logic into sources

Source: Databrix

Page 14: Spark Will Replace Hadoop ! Know Why

Slide 14Slide 14Slide 14 http://www.edureka.co/apache-spark-scala-training

Spark Career Path

Page 15: Spark Will Replace Hadoop ! Know Why

Slide 15Slide 15Slide 15 http://www.edureka.co/apache-spark-scala-training

Job Roles & Industry Focus

Source: Typesafe

Page 16: Spark Will Replace Hadoop ! Know Why

Slide 16Slide 16Slide 16 http://www.edureka.co/apache-spark-scala-training

Salary Trends

Page 17: Spark Will Replace Hadoop ! Know Why

Slide 17Slide 17Slide 17 http://www.edureka.co/apache-spark-scala-training

Major Companies Using Hadoop

Page 18: Spark Will Replace Hadoop ! Know Why

Slide 18Slide 18Slide 18 http://www.edureka.co/apache-spark-scala-training

Industry Adoption

Source: Typesafe

Page 19: Spark Will Replace Hadoop ! Know Why

Slide 19Slide 19Slide 19 http://www.edureka.co/apache-spark-scala-training

How Companies are using Spark?

Page 20: Spark Will Replace Hadoop ! Know Why

Slide 20Slide 20Slide 20 http://www.edureka.co/apache-spark-scala-training

General Business Goals

Source: Typesafe

Page 21: Spark Will Replace Hadoop ! Know Why

http://www.edureka.co/apache-spark-scala-training

Demo

Page 22: Spark Will Replace Hadoop ! Know Why

Slide 22Slide 22Slide 22 http://www.edureka.co/apache-spark-scala-training

The Big Question!

Is Spark going to replace Hadoop?

Page 23: Spark Will Replace Hadoop ! Know Why

Slide 23Slide 23Slide 23 http://www.edureka.co/apache-spark-scala-training

The Big Question!

Is Spark going to replace Hadoop?

Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce

Reasons:

1. Hadoop MapReduce cannot handle real-time processing 2. Hadoop MapReduce is slower than Hadoop Spark3. With rise of IOT, Spark is a must

Page 24: Spark Will Replace Hadoop ! Know Why

Questions

Slide 24 http://www.edureka.co/apache-spark-scala-training

Page 25: Spark Will Replace Hadoop ! Know Why

Slide 25

Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better!

Please spare few minutes to take the survey after the webinar.

http://www.edureka.co/apache-spark-scala-training

Survey

Page 26: Spark Will Replace Hadoop ! Know Why