introduction to cloud computing and big data-hadoop

65
Introduction: Cloud Computing and Big Data - Hadoop Presented By : Nagarjuna D.N SAP CTL AT&T, Bengaluru Date: 14-07-2015

Upload: nagarjuna-dn

Post on 13-Apr-2017

777 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Introduction to Cloud computing and  Big Data-Hadoop

Introduction: Cloud Computing and Big Data - Hadoop

Presented By: Nagarjuna D.NSAP CTLAT&T, Bengaluru

Date: 14-07-2015

Page 2: Introduction to Cloud computing and  Big Data-Hadoop

Overview• Cloud Computing Evolution

• Why Cloud Computing needed?

• Cloud Computing Models

• Cloud Solutions

• Cloud Jobs opportunities

• Criteria for Big Data

• Big Data challenges

• Technologies to process Big Data- Hadoop

• Hadoop History and Architecture

• Hadoop Eco-System

• Hadoop Real-time Use cases

• Hadoop Job opportunities

• Hadoop and SAP HANA integration

• Summary2

Page 3: Introduction to Cloud computing and  Big Data-Hadoop

Internet of Things (IoT)

Big Data “One of the Reason is Cloud Computing….!”

3

Page 4: Introduction to Cloud computing and  Big Data-Hadoop

Cloud Computing (Evolution of an internet and its hidden from the end user)

• Infrastructure is maintained somewhere with shared computing resources -servers and storage, network, all delivered over the Internet.

• The Cloud delivers a hosting environment that is- -immediate, -flexible, -scalable,-secure,-available,-saves corporations money, time and resources.

Flexible

Scalable

Secure

Page 5: Introduction to Cloud computing and  Big Data-Hadoop

Cloud Computing (Cont….)

• In addition, the platform provides on demand services, i.e always on, anywhere, anytime and any place.

• “Pay-for-what-you-use”- metered basis.

• Its based on utility computing and Virtualization.

5

Page 6: Introduction to Cloud computing and  Big Data-Hadoop

Cloud Computing History

Page 7: Introduction to Cloud computing and  Big Data-Hadoop

Traditional Infrastructure Model

Chart Title Forecasted Infrastructure Demand

Time

Capital

7

Page 8: Introduction to Cloud computing and  Big Data-Hadoop

Acceptable Surplus

Chart Title Forecasted Infrastructure Demand

Surplus

Time

Capital

8

Page 9: Introduction to Cloud computing and  Big Data-Hadoop

Actual Infrastructure Model

Chart Title

Actual Infrastructure Demand

Time

Capital

9

Page 10: Introduction to Cloud computing and  Big Data-Hadoop

Unacceptable Surplus

Surplus

Time

Capital

10

Page 11: Introduction to Cloud computing and  Big Data-Hadoop

Unacceptable Deficit

Deficit

Time

Capital

11

Page 12: Introduction to Cloud computing and  Big Data-Hadoop

Utility Infrastructure Model(Concept of Cloud Computing)

Chart Title

Actual Infrastructure Demand

Time

Capital

12

Page 13: Introduction to Cloud computing and  Big Data-Hadoop

Cloud Flavors (Service Models)

• IaaS – Infrastructure as a Service

• PaaS – Platform as a Service

• SaaS – Software as a Service

13

Page 14: Introduction to Cloud computing and  Big Data-Hadoop

SaaS Examples

14

Page 15: Introduction to Cloud computing and  Big Data-Hadoop

IaaS Examples

15

Page 16: Introduction to Cloud computing and  Big Data-Hadoop

PaaS Examples

16

Page 17: Introduction to Cloud computing and  Big Data-Hadoop

Cloud Deployment Models• Public Cloud

• Private Cloud

• Hybrid Cloud

• Community Cloud

17

Page 18: Introduction to Cloud computing and  Big Data-Hadoop

Cloud Distribution Examined

18

Page 19: Introduction to Cloud computing and  Big Data-Hadoop

Enterprise Cloud Solutions

1. Test / Development / QA Platformo Use cloud infrastructure servers as test and development

platform2. Disaster Recoveryo Keep images of servers on cloud infrastructure ready to

go in case of a disaster 3. Cloud File Storageo Backup or Archive company data to cloud file storage

4. Load Balancingo Use cloud infrastructure for overflow management during

peak usage times

19

Page 20: Introduction to Cloud computing and  Big Data-Hadoop

Enterprise Cloud Solutions (cont)

5. Overhead Controlo Lower overhead costs and make bids more competitive

6. Distributed Network Control and Cost Reportingo Create an individual private networks (VPC) for each of

subsidiaries or contracts7. Rapid Deploymento Turn up servers immediately to fulfill project timelines

8. Functional IT Labor Shifto Refocus IT labor expense on revenue producing activities

20

Page 21: Introduction to Cloud computing and  Big Data-Hadoop

Preparing for the Future Cloud IT JobsSampling of IT skills likely to be in demand in the futureo Functional application development and support

I.e. Oracle, SAP, SQL, linking hardware to software o Leveraging data to make strategic business decisions

I.e. Business Intelligence : Applying sales forecasts to inventory and manufacturing decisions

o Mobile apps Android, iPhone, Windows Mobile

o Wi-Fi engineers USF to include broadband communications (LTE replaces GSM/CDMA)

o Optical engineers Optical offers the highest bandwidth today (PON, CWDM, DWDM)

o Virtualization Specialists Economies of scale require virtualization (server, storage, client…)

o IP Engineerso Network Security Specialistso Web developerso Social Media developerso Business Intelligence application development and support

21

Page 22: Introduction to Cloud computing and  Big Data-Hadoop
Page 23: Introduction to Cloud computing and  Big Data-Hadoop

IT Cloud infrastructure

23

Page 24: Introduction to Cloud computing and  Big Data-Hadoop

“Big Data- Big Thing”

• Big Data is exactly like Rubik’s cube.

• Just like a Rubik’s cube Big Data has many different solutions.

• If you take five Rubik’s cube and mix up the same way and give it to five different expert’s.

• They will solve the Rubik’s cube in fractions of the seconds.

• But if you pay attention to the same closely, you will notice that even though the final outcome is the same, the route taken to solve the Rubik’s cube is not the same.

• Every expert will start at a different place(colors) and will try to resolve it with different methods.

• It is  nearly impossible to have a exact same route taken by two experts.

Begining Big Data

24

Page 25: Introduction to Cloud computing and  Big Data-Hadoop

25

Page 26: Introduction to Cloud computing and  Big Data-Hadoop

Big Data Definition in general

• Big Data is a collection of data sets that are large and complex in nature.

• They constitute both structured and unstructured data that grow large so fast that they are not manageable by traditional relational database systems(Eg., RDBMS).

26

Page 27: Introduction to Cloud computing and  Big Data-Hadoop

Big Data Technically

i. Volumepetta bytes or Zetta bytes.

ii. VelocityBatch or real(stream) time processing.

iii. VarietyStructured, semi-structured & Unstructured.It is estimated that 80% of world’s data are unstructured and rest of them semi-structured and structured.

iv. Veracity The quality of the data being captured

can vary greatly.

Fig.Big Data Based on Doug Cutting 3Vs model

27

Page 28: Introduction to Cloud computing and  Big Data-Hadoop

Variety of Data1. Structured Data:- Data i.e. identifiable because its organized in a structure(Standard defined format)E.g.: Database, Data Warehouses & Electronic spreadsheets.

2. Semi-Structured Data:- Data i.e. neither raw data, nor typed data in a conventional database systemE.g.: Wiki pages, Tweets, Facebook data & Instant Messages.

3. Unstructured Data:- its doesn’t have standard defined structureE.g.: Data files, Audio files, Video, Graphics & Multimedia.

28

Page 29: Introduction to Cloud computing and  Big Data-Hadoop

Traditional Data v/s Big Data

Attributes Traditional Data Big Data

Volume Gigabytes to terabytes Petabytes to zettabytes

Organizaton Centralized Distributed

Structure Structured Semi-structured & unstructured

Data model Strict schema based Flat schema

Data relationship Complex interrelationships Almost flat with few relationships

29

Page 30: Introduction to Cloud computing and  Big Data-Hadoop

Criteria of Big Data

1. 272 hours of video are uploaded to YouTube every minute and over 3 billion hours of video are watched every month.

2. Radio Frequency ID (RFID) systems generated up to 1,000 times more data compared to the conventional bar code systems.

3. 340 million tweets are sent every day and that amounts of 7TB of data.

4. Social networking site, Facebook, processes over 10TB of data every day.

5. Over 5 billion people use cell phones to call, send SMS, email, browse Internet, and interact via social networking sites.

6. The Square Kilometre Array project of NASA receives 700 TB of data per second.

30

Page 31: Introduction to Cloud computing and  Big Data-Hadoop

Challenges with Big Data

1. Scaling is costly.2. Strategy must be in place before you hit the limit of a single

computer. 3. Most entreprises responded to scalability needs when they started

facing problems of poor response and low throughput.4. Adding hardware to existing system is manpower extensive and

hence error prone.5. Mixed data type - structured and unstructured - makes scaling

even harder.

31

Page 32: Introduction to Cloud computing and  Big Data-Hadoop

Exploring Big Data for business insights

32

Page 33: Introduction to Cloud computing and  Big Data-Hadoop

33

Page 34: Introduction to Cloud computing and  Big Data-Hadoop

Big Data solutions with Hadoop

34

Page 35: Introduction to Cloud computing and  Big Data-Hadoop

Organizations Adopted Big Data

35

Page 36: Introduction to Cloud computing and  Big Data-Hadoop

How are Organizations using Big Data Technology?

36

Page 37: Introduction to Cloud computing and  Big Data-Hadoop

37

Page 38: Introduction to Cloud computing and  Big Data-Hadoop

Feb 14th 2011 –Watson is IBM’s super computer built using Big Data Technology.Its not online & its process like a human brain.

38

Page 39: Introduction to Cloud computing and  Big Data-Hadoop

39

Page 40: Introduction to Cloud computing and  Big Data-Hadoop

Tools typically used in Big Data Scenarios

40

Page 41: Introduction to Cloud computing and  Big Data-Hadoop

Technology to process Big Data- Hadoop (Open-source software framework written in Java)

• Open-source software: It's free to download, though more and more commercial versions of Hadoop are becoming available.

• Framework: It means that everything you need to develop and run software applications is provided –programs, connections, etc.

• Distributed storage: The Hadoop framework breaks big data into blocks, which are stored on clusters of commodity hardware.

• Processing power: Hadoop concurrently processes large amounts of data using multiple low-cost computers for fast results.

• Hadoop an DFS and not Database. Its designed for information from many forms.

• Open source project started by Doug Cutting- employee of Yahoo. Hadoop is the name of his sons toy elephant.

• Apache software foundation- Apache Hadoop.

41

Page 42: Introduction to Cloud computing and  Big Data-Hadoop

Hadoop Creation History

42

Page 43: Introduction to Cloud computing and  Big Data-Hadoop

Hadoop ArchitectureHadoop core has two major components (daemons):1. HDFS

a. NameNodeb. Secondary NameNodec. DataNode

2. MapReduce Engine (distributed data processing framework)a. JobTrackerb. TaskTracker

46

Page 44: Introduction to Cloud computing and  Big Data-Hadoop

What components make up Hadoop?

• Hadoop Common – the libraries and utilities used by other Hadoop modules.

• Hadoop Distributed File System (HDFS) – the Java-based scalable system that stores data across multiple machines without prior organization.

• MapReduce – a software programming model for processing large sets of data in parallel.

• YARN – resource management framework for scheduling and handling resource requests from distributed applications. (YARN is an acronym for Yet Another Resource Negotiator.)

45

Page 45: Introduction to Cloud computing and  Big Data-Hadoop

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Slaves

Master

Task Tracker

Data Node

Job Tracker

Name Node

MapReduce

HDFS

Hadoop Architecture

47

Page 46: Introduction to Cloud computing and  Big Data-Hadoop

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Slaves

Master

Task Tracker

Data Node

Job Tracker

Name Node

48

Page 47: Introduction to Cloud computing and  Big Data-Hadoop

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Slaves

Master

Task Tracker

Data Node

Job Tracker

Name Node

49

Page 48: Introduction to Cloud computing and  Big Data-Hadoop

Node

RACK

RACK

RACK

RACK

Cluster

Data Center

50

Page 49: Introduction to Cloud computing and  Big Data-Hadoop

51

Page 50: Introduction to Cloud computing and  Big Data-Hadoop

MapReduce Example

52

Page 51: Introduction to Cloud computing and  Big Data-Hadoop

Benefits of Hadoop• Scalable– New nodes can be added without needing to change

data formats.• Cost effective– Hadoop brings massively parallel computing to

commodity hardwares.• Flexible– Hadoop is schema-less, and can absorb any type of data,

structured or not, from any number of sources.• Fault tolerant– When you lose a node, the system redirects work to

another location of the data and continues processing without missing a heartbeat.

• Programming languages- Java(default)/python.• Last but not least – it’s free! ( Open source).

43

Page 52: Introduction to Cloud computing and  Big Data-Hadoop

Hadoop is not Suitable for All Kinds of Applications

Hadoop is not suitable to:

• perform real-time, stream-based processing where data is processed immediately upon its arrival.

• perform online access where low latency is required.

44

Page 53: Introduction to Cloud computing and  Big Data-Hadoop

Hadoop Eco-System

53

Page 54: Introduction to Cloud computing and  Big Data-Hadoop

Real-Time Hadoop Use Cases

1. Risk Modeling (How can banks understand customers & markets ?)

2. Customer churn analysis (why do companies really lose customers?)

3. Ad Targeting (How can companies increase campaign efficiency?)

4. Point of sale transaction analysis (How do retailers target promotion guaranteed to make you buy?)

5. Search quality (What’s in your search?) Hyperlink54

Page 55: Introduction to Cloud computing and  Big Data-Hadoop

55

Page 56: Introduction to Cloud computing and  Big Data-Hadoop

56

Page 57: Introduction to Cloud computing and  Big Data-Hadoop

Hadoop Job Opportunities

57

Page 58: Introduction to Cloud computing and  Big Data-Hadoop

58

Page 59: Introduction to Cloud computing and  Big Data-Hadoop

Apache Hadoop & SAP HANA Integration(Future Generation Technologies)

59

Page 60: Introduction to Cloud computing and  Big Data-Hadoop

In Real-Time Business

60

Page 61: Introduction to Cloud computing and  Big Data-Hadoop

Resources

61

Page 62: Introduction to Cloud computing and  Big Data-Hadoop

Summaryo Cloud Computingo Big Datao Apache Hadoop o Hadoop and SAP HANA integration

62

Page 63: Introduction to Cloud computing and  Big Data-Hadoop

Than

k You

Page 64: Introduction to Cloud computing and  Big Data-Hadoop

More Details

Nagarjuna D [email protected][email protected]

More Cloud Solutions Architect Skills:• Amazon Cloud (Amazon Web Services)

• MongoDB (NoSQL Database)

• Play Framework (Web Application Framework)

• Domain/ SSL Certificate setup

• Apache Hadoop, Apache Pig, Apache hive

Page 65: Introduction to Cloud computing and  Big Data-Hadoop

Your Valuable Feedback Please

• Compulsory to where I must improve………..!