hadoop vs apache spark

17
Hadoop Vs Apache Spark

Upload: valuecoders

Post on 13-Jan-2017

139 views

Category:

Internet


1 download

TRANSCRIPT

Page 1: Hadoop vs apache spark

Hadoop Vs Apache Spark

Page 2: Hadoop vs apache spark

Hadoop Introduction ● Hadoop helps in storing large data sets. It also helps in running processes related to distributed

analytics. Hadoop is a framework that is open source and can be freely used. Large data sets can

be quickly and easily stored using Hadoop. Hadoop is an efficient framework – it does not require

large amounts of data transfer.

● Hadoop makes sure that one job is processed at a time. Data warehousing is one of the core

functions of Hadoop. The framework ensures that big data applications continue to run in case of a

failures of individual servers.

● Hadoop is a framework that is highly prefered for batch processing. The Hadoop framework is

written in Java . Developers also use Hive on Top of Hadoop for adding SQL compatibility.

● Hadoop can be used without any programming, because there are numerous integration services

available out there.

Page 3: Hadoop vs apache spark

Hadoop Advantages

Page 4: Hadoop vs apache spark

Scalability● One of the key advantages of developing with hadoop is scalability. Since large data sets can

be easily stored and distributed, it is highly scalable.

● A large number of nodes are made possible by Hadoop, ensuring large amounts of data storage and distribution. In comparison to traditional RDMS, Hadoop is highly scalable.

Page 5: Hadoop vs apache spark

Cost Effective● The big data requirements of today are humongous and these requirements can be fulfilled in a

cost effective manner using Hadoop. The cost of data processing is much higher when it comes to traditional database management systems.

● The simplified processing of complex data ensures that Hadoop is a cost effective framework.

Page 6: Hadoop vs apache spark

Flexible Solution ● Operating on different types of data and having access to different types of data is possible with

Hadoop and this makes it a very flexible solution. This helps in generating value from all sorts of data that is gathered.

● One could use a variety of data sources like social media and email etc. to gather as much useful data as possible.

Page 7: Hadoop vs apache spark

Speed● Since there is a distributed system of files in Hadoop. The processing servers and storage

servers are the same, making the process extremely fast.

● The processing of data is highly efficient using the Hadoop framework.

Page 8: Hadoop vs apache spark

Reliable● The higher level of tolerance to faults, is found only in Hadoop. Data replication in different

nodes ensures that a clear backup is available. ● This minimizes the chances of data failure. Hadoop is quite a reliable framework and helps in

avoiding both single and multiple failures.

Page 9: Hadoop vs apache spark

Looking for Agile teams for your big data project? Trust ValueCoders for all kinds of software development and big data projects.

Page 10: Hadoop vs apache spark

Spark Introduction ● Spark, is a tool that works on processing the data that has been distributed, using the Hadoop

framework. The Spark platform has be designed run on top of Hadoop. It works as an alternative the batch model. It can used for hastening interactive queries and processing real time data. Spark does not have its own file management system, but integrated with one.

● Spark is quite faster than hadoop when it comes to processing of data. Spark is different from Hadoop because it ensures complete data analytics of real time as well as stored data. Spark does not have the distributed storage system which is an essential for big data projects. Spark is also known for its advanced data processing and machine learning.

Page 11: Hadoop vs apache spark

Spark Advantages

Page 12: Hadoop vs apache spark

Faster● Spark places the data into Resilient Distributed Datasets. This data gets stored in the memory

making it easily accessible. ● Since the data is easily accessed from the memory, the MapReduce jobs can be undertaken

very quickly.

Page 13: Hadoop vs apache spark

Real Time Processing ● There is a continuous growth of real time data. Processing large quantities of a real time data can be a big

challenge.

● This can help in processing of logs for live streaming sites and also help in fraud detection and electronic trading data.

Page 14: Hadoop vs apache spark

Using Big Data Effectively● Big data needs to be used effectively to reach the right set of people with the right messaging. Big

data makes use of very specific audiences to bring out the best conversion rate for a retail business. Many retail marketers fail to bring out the right results for the business because of lack of understanding of how to make the data usable and how to analyse it.

● Technology has to be fully prepared and used for big data usage and integration.

Page 15: Hadoop vs apache spark

Processing of Graphs● Graph processing helps in capturing the relationship between data and entities.

● The process helps in analysing social as well as advertising data. Machine learning helps in carrying out advanced analytics and getting consumer understanding.

Page 16: Hadoop vs apache spark

Power● Most companies need 2 systems – one for storing and streaming data and the other for

analyzing the data. ● Spark helps in simplified application development, maintenance and deployment.

Page 17: Hadoop vs apache spark

Get in Touch

[email protected] www.valuecoders.com www.facebook.com/valuecoders www.twitter.com/valuecoders www.linkedin.com/valuecoders