big data gaurav

Post on 10-May-2015

1.138 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Big Data Class 1

TRANSCRIPT

BUMPER

Understanding Big Data

Class 1Introduction to Big Data

Understanding Big Data

Business Applications of Big Data

Class 1Introduction to Big Data

Understanding Big Data

Business Applications of Big Data

Technologies for handling Big Data

Class 1Introduction to Big Data

Understanding Big Data

Business Applications of Big Data

Technologies for handling Big Data

Big Data Management Systems – Databases & Warehouses

Class 1Introduction to Big Data

Understanding Big Data

Business Applications of Big Data

Technologies for handling Big Data

Big Data Management Systems – Databases & Warehouses

Analytics & Big Data

Class 1Introduction to Big Data

Topic 1

Class 1 Introduction to Big Data

Understanding Big Data

What is Big Data?

Topic 1 – Understanding Big Data

What is Big Data?

Topic 1 – Understanding Big Data

Structuring & Elements

What is Big Data?

Topic 1 – Understanding Big Data

Structuring & Elements

Application in Business & Careers

DATA

Personal Computers

Facebook

Twitter

YouTube

Google

ATMs

Drop Box

Picasa

2002 5 Exabytes Online Data

2009

281 Exabytes Online Data(56 Times Increase)

A pool of large-sized datasets to capture, store,

What is Big Data?

A pool of large-sized datasets to capture, store,

What is Big Data?

search, share, transfer, analyse, and visualise

A pool of large-sized datasets to capture, store,

What is Big Data?

search, share, transfer, analyse, and visualiserelated information or data within an acceptable elapsed time.

Data = Information

Data = InformationInformation = Insight

• Every second, consumers make 10,000 payment card transactions worldwide

• Every second, consumers make 10,000 payment card transactions worldwide

• Every hour, Walmart handles more than 1 million customer transactions

• Every second, consumers make 10,000 payment card transactions worldwide

• Every hour, Walmart handles more than 1 million customer transactions

• Everyday Twitter’s users post 500 million tweets per day

• Every second, consumers make 10,000 payment card transactions worldwide

• Every hour, Walmart handles more than 1 million customer transactions

• Everyday Twitter’s users post 500 million tweets per day

• Facebook users post 2.7 billion likes and comments in a day

BIG DATA

Is a new datachallenge that

requiresleveraging

existingsystems

differently

BIG DATA

Is a new datachallenge that

requiresleveraging

existingsystems

differently

Is classified in terms of:Volume (terabytes, records,

transactions)Variety (internal, external, behavioural, or/and social)Velocity (near or real-time

assimilation)

BIG DATA

Is a new datachallenge that

requiresleveraging

existingsystems

differently

Is classified in terms of:Volume (terabytes, records,

transactions)Variety (internal, external, behavioural, or/and social)Velocity (near or real-time

assimilation)

Is usually unstructured

and qualitative in

Nature

• Understanding target customer

Advantages of Studying Big Data:

• Understanding target customer

• Cutting down expenditures in the healthcare

Advantages of Studying Big Data:

• Understanding target customer

• Cutting down expenditures in the healthcare

• Increase in operating margins in retail

Advantages of Studying Big Data:

• Understanding target customer

• Cutting down expenditures in the healthcare

• Increase in operating margins in retail

• Profits with improvements in operational efficiency

Advantages of Studying Big Data:

• Sports

Industries that Benefit:

• Sports

• Science and Research

Industries that Benefit:

• Sports

• Science and Research

• Security and Law Enforcement

Industries that Benefit:

• Sports

• Science and Research

• Security and Law Enforcement

• Financial Trading

Industries that Benefit:

• Procurement

Departments that can Benefit:

• Procurement• Product Development

Departments that can Benefit:

• Procurement• Product Development• Manufacturing

Departments that can Benefit:

• Procurement• Product Development• Manufacturing• Distribution

Departments that can Benefit:

• Procurement• Product Development• Manufacturing• Distribution• Marketing

Departments that can Benefit:

• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management

Departments that can Benefit:

• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising

Departments that can Benefit:

• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising• Sales

Departments that can Benefit:

• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising• Sales• Store operations

Departments that can Benefit:

• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising• Sales• Store operations• Human Resources

Departments that can Benefit:

Flu Indications & WarningsMassive Data Collection

Analyse Collected

Data

Early Warnings for Flu Plague

Social Data from Networking Sites reveals Behavioural Patterns

Use Big Data for Growth & Value Addition

RECAP

What is Big Data, its advantages and various sources

BUMPER

BUMPER

Topic 1

Class 1 - Introduction to Big Data

Understanding Big Data

What is Big Data?

Class 1 - Introduction to Big Data

What is Big Data?

Class 1 - Introduction to Big Data

Structuring & Elements

What is Big Data?

Class 1 - Introduction to Big Data

Structuring & Elements

Application in Business & Careers

How do I choose a book, of the millions available on my favorite sites or stores?

How can I use the vast amount of data and information I come across?

How do I keep myself updated of events, news?

Which news articles should I read?

How do I choose a book, of the millions available on my favorite sites or stores?

How can I use the vast amount of data and information I come across?

Formats of Data:

Formats of Data:

Formats of Data:

Formats of Data:

Internal – Organisational or enterprise data

Sources of Data:

External - Social Data from the internet or Government

Structured Data

Unstructured Data

Semi-Structure

d Data

BIG DATA

Structured Data

• Has a predefined format

Features of Structured Data:

• Has a predefined format

• Resides in fixed fields within a record

Features of Structured Data:

• Has a predefined format

• Resides in fixed fields within a record

• Has their attributes mapped

Features of Structured Data:

• Has a predefined format

• Resides in fixed fields within a record

• Has their attributes mapped

• Used to report against predetermined data types

Features of Structured Data:

Sources of Structured Data:

• Relational databases

Sources of Structured Data:

• Relational databases

• Flat files in record format

Sources of Structured Data:

• Relational databases

• Flat files in record format

• Multidimensional databases

Sources of Structured Data:

• Relational databases

• Flat files in record format

• Multidimensional databases

• Legacy databases

Unstructured Data

Sources of Unstructured Data:

• Organisational Data

Sources of Unstructured Data:

• Organisational Data

• Social Media

Sources of Unstructured Data:

• Organisational Data

• Social Media

• Mobile Data

Challenges of Using Unstructured Data:

• Difficulty and time consumption in making sense

Challenges of Using Unstructured Data:

• Difficulty and time consumption in making sense

• Difficulty in combining and linking unstructured data to more structured information

Challenges of Using Unstructured Data:

• Difficulty and time consumption in making sense

• Difficulty in combining and linking unstructured data to more structured information

• Cost-addition in terms of the storage wastage and human resource needed

Semi-Structured Data

Sources of Semi-Structured data:

• Database systems

Sources of Semi-Structured data:

• Database systems

• File systems like Web data and bibliographic data

Sources of Semi-Structured data:

• Database systems

• File systems like Web data and bibliographic data

• Data exchange formats like scientific data

Sl. No Name E-mail

1. Sam Jacobs smj@xyz.com

2. First Name David davidb@xyz.com

Last Name Brown

Volume

Velocity

Variety

What is Big Data?

Class 1 - Introduction to Big Data

Structuring & Elements

Application in Business & Careers

Big Data Application In Business Analytics

What are the areas where Big Data can be applied?

Transportation

Provides improved traffic information and autonomous features

Education

Through innovative approaches for teachers to analyze students

Travel

Apply analytics to pricing, inventory, and advertising to improve customer experiences

Governments

To make informed decisions for fraud management, discover unknown threats, ensure security of global supply chain

Healthcare

To ensure clinical protocols that will ensure the best health outcome for patients

Careers in Big

Data

BIG Career Opportunities

Major Big Data Hiring Companies:

Product companies, e.g., Oracle

Technology drivers, e.g., Google

Services companies, e.g., EMC

Data analytics companies, e.g., Splunk

The most common job titles in Big Data include:

Big Data Analyst

The most common job titles in Big Data include:

Big Data Analyst Big Data Scientist

The most common job titles in Big Data include:

Big Data Analyst Big Data Scientist

Big Data Developer

Module 1Introduction to Big Data

Module 1Introduction to Big Data

Big Data AnalystCertification Track

Big Data DeveloperCertification Track

Module 1Introduction to Big Data

Big Data AnalystCertification Track

Big Data DeveloperCertification Track

Module 2Introduction to Analytics & R Programming

Module 3Data Analysis

Using R

Module 4Advanced Analytics Using R

Module 2Managing a

Big Data Ecosystem

Module 1Introduction to Big Data

Big Data AnalystCertification Track

Big Data DeveloperCertification Track

Module 2Introduction to Analytics & R Programming

Module 3Data Analysis

Using R

Module 4Advanced Analytics Using R

Module 2Managing a

Big Data Ecosystem

Module 5Machine Learning Concepts

Module 3Storing &

Processing Data: HDFS & MapReduce

Module 4: Increasing

Efficiency with Hadoop Tools

Module 5Additional

Hadoop Tools: ZooKeeper,

Sqoop, Flume,

YARN & Storm

Module 1Introduction to Big Data

Big Data AnalystCertification Track

Big Data DeveloperCertification Track

Module 2Introduction to Analytics & R Programming

Module 3Data Analysis

Using R

Module 4Advanced Analytics Using R

Module 2Managing a

Big Data Ecosystem

Module 5Machine Learning Concepts

Module 3Storing &

Processing Data: HDFS & MapReduce

Module 4: Increasing

Efficiency with Hadoop Tools

Module 5Additional

Hadoop Tools: ZooKeeper,

Sqoop, Flume,

YARN & Storm Module 6

Social Media, Mobile

Analytics & Visualisation

Module 7 Industry

Applications of Big Data

Applications

Module 6Leveraging NoSQL

& Hadoop: Real Time, Security &

Cloud

Module 7Commercial

Hadoop Distribution &

Management Tools

Module 1Introduction to Big Data

Big Data AnalystCertification Track

Big Data DeveloperCertification Track

Module 2Introduction to Analytics & R Programming

Module 3Data Analysis

Using R

Module 4Advanced Analytics Using R

Module 2Managing a

Big Data Ecosystem

Module 5Machine Learning Concepts

Module 3Storing &

Processing Data: HDFS & MapReduce

Module 4: Increasing

Efficiency with Hadoop Tools

Module 5Additional

Hadoop Tools: ZooKeeper,

Sqoop, Flume,

YARN & Storm Module 6

Social Media, Mobile

Analytics & Visualisation

Module 7 Industry

Applications of Big Data

Applications

Module 6Leveraging NoSQL

& Hadoop: Real Time, Security &

Cloud

Module 7Commercial

Hadoop Distribution &

Management ToolsComplete

Project

Wrox Certified Big Data Analyst/ Developer

Technical Skills Required for a Big Data Analyst:

Technical Skills Required for a Big Data Analyst:

• Handle & analyse massive data sets using MapReduce

Technical Skills Required for a Big Data Analyst:

• Handle & analyse massive data sets using MapReduce

• Hadoop & components Hbase & Hive

Technical Skills Required for a Big Data Analyst:

• Handle & analyse massive data sets using MapReduce

• Hadoop & components Hbase & Hive

• SQL and NoSQL languages such as Impala, Hive and Pig

Technical Skills Required for a Big Data Analyst:

• Handle & analyse massive data sets using MapReduce

• Hadoop & components Hbase & Hive

• SQL and NoSQL languages such as Impala, Hive and Pig

• Analytical tools such as SAS, R, Tableau

Technical Skills Required for a Big Data Analyst:

• Handle & analyse massive data sets using MapReduce

• Hadoop & components Hbase & Hive

• SQL and NoSQL languages such as Impala, Hive and Pig

• Analytical tools such as SAS, R, Tableau

• Statistical techniques to implement text analytics solutions

Technical Skills Required for a Big Data Analyst:

• Handle & analyse massive data sets using MapReduce

• Hadoop & components Hbase & Hive

• SQL and NoSQL languages such as Impala, Hive and Pig

• Analytical tools such as SAS, R, Tableau

• Statistical techniques to implement text analytics solutions

• Data handling and manipulation techniques

Technical Skills Required for a Big Data Analyst:

• Handle & analyse massive data sets using MapReduce

• Hadoop & components Hbase & Hive

• SQL and NoSQL languages such as Impala, Hive and Pig

• Analytical tools such as SAS, R, Tableau

• Statistical techniques to implement text analytics solutions

• Data handling and manipulation techniques

• Generate client ready dashboards, reports and visualizations

Soft Skills Required:

• Strong written & verbal communication skills

Soft Skills Required:

• Strong written & verbal communication skills

• Analytical Ability

Soft Skills Required:

• Strong written & verbal communication skills

• Analytical Ability

• Basic understanding of how a business works

Future of Big Data

RECAP

What are the various types and structures of Big Data and the elements that form it

What are the business applications of Big Data and the career opportunities associated

BUMPER

BUMPER

BIG DATA

Topic 2Business Applications of Big Data

Class 1: Introduction to Big Data

Social Media

Topic 2Business Applications of Big Data

Significance of Social Network Data

Topic 2Business Applications of Big Data

Significance of Social Network Data

Financial Fraud & Big Data

Topic 2Business Applications of Big Data

Significance of Social Network Data

Financial Fraud & Big Data

Fraud Detection in Insurance

Topic 2Business Applications of Big Data

Significance of Social Network Data

Financial Fraud & Big Data

Fraud Detection in Insurance

Use in Retail Industry

Significance of Social Network Data

What is Social Network Data?

Significance of Social Network Data

What is Social Network Data?

What is Social Network Analysis?

Significance of Social Network Data

What is Social Network Data?

What is Social Network Analysis?

What are the uses of Social Network Data Analysis?

Significance of Social Network Data

What is Social Network Data?

What is Social Network Analysis?

What are the uses of Social Network Data Analysis?

What is Sentiment Analysis?

DATA

Social Media

AGE

Social Media

AGE

GENDER

Social Media

AGE

GENDER

LOCATION

Significance of Social Network Data

What is Social Network Data?

What is Social Network Analysis?

What are the uses of Social Network Data Analysis?

What is Sentiment Analysis?

Social Network Analysis (SNA)

SocialNetwork

Social Network Analysis (SNA)

SocialNetwork

DATA

Analysis

Social Network Analysis (SNA)

SocialNetwork

DATA

Total Number of calls

Total Number of calls

Total Number of SMS

Structure of a Caller’s Social Network

Social Network Site

Social Network Site

Social Network Site

Social Network Site

Social Network Site

Social Network Site

Social Network Site

Social Network Site

Social Network Site

Social Networking Analysis a Big Data Problem

Significance of Social Network Data

What is Social Network Data?

What is Social Network Analysis?

What are the uses of Social Network Data Analysis?

What is Sentiment Analysis?

Social Network Analysis (SNA)

Business Intelligence

Social Network Analysis (SNA)

Business Intelligence

Marketing

Social Network Analysis (SNA)

Business Intelligence

Marketing

Product Design & Development

Social Network Analysis (SNA)

Business Intelligence

Marketing

Product Design & Development

Customer Relationship Management (CRM)

A• E• F

B• A• D

C• H• OGroup

AGroup GH

Provides new contexts in which decisions are data driven, not opinion driven

Social Network Data Analysis

Provides new contexts in which decisions are data driven, not opinion driven

Organizations to shift goals to maximize profitability of customer’s network

Social Network Data Analysis

Provides new contexts in which decisions are data driven, not opinion driven

Organizations to shift goals to maximize profitability of customer’s network

Organizations to identify highly connected customers

Social Network Data Analysis

Organizations to lure highly connected customers with free trials and solicit their feedback

Social Network Data Analysis

Organizations to lure highly connected customers with free trials and solicit their feedback

Organizations to encourage internal customers to become more active

Social Network Data Analysis

Social Network Analysis (SNA)

Business Intelligence

Marketing

Product Design & Development

Social Data

Social Data

Analysis

Analyze Media Communication

Social Network Analysis (SNA)

Business Intelligence

Marketing

Product Design & Development

System

System

DATA

System

Significance of Social Network Data

What is Social Network Data?

What is Social Network Analysis?

What are the uses of Social Network Data Analysis?

What is Sentiment Analysis?

Product Development and Offerings

Sentiment Analysis

Marketers Business Professionals

Followers

3,46,259 Followers

2,73,591Likes

But is one of the most disliked airlines. Why?

SummaryRECAP

What is social network data and analysisWhat are its uses and values

BUMPER

BUMPER

BIG DATA

Topic 2Business Applications of Big Data

Class 1: Introduction to Big Data

Topic 2Business Applications of Big Data

Significance of Social Network Data

Financial Fraud & Big Data

Fraud Detection in Insurance

Use in Retail Industry

BANK

Common Financial Frauds Common Financial Frauds

Credit Card Frauds

Exchange or Return Policy Fraud

Personal Information Fraud

understand customers ordering

patterns

Prevent Frauds

watch outFor red flags

Big Data

Fraud

Analyzing data

sample size Small

Can understand various patterns of the fraud

Analyzing data

sample size Large

Cannot understand various patterns of the fraud

• Size could not be increased, required huge investments in time and money

• Big Data techniques can overcome this challenge

Big Data analytics can…

Run check on all data to identify fraudulent ones

Identify new ways of fraud and add to a set of fraud-prevention checks

Doesn’t impede customers with unnecessary polices and governance structures

Fraud Detection in Real Time

BIG DATA

live transactions

sources of data

BIG DATA

Historical Data Indicate fraud patterns

Checks to prevent real-

time fraud

Real-time Analysis

BIG DATA

Create comparisons

Drawing Maps & Graphs

Decisions and effective systems

BLOCK FRAUD

Topic 2Business Applications of Big Data

The Significance of Social Network Data

Financial Fraud and Big Data

Fraud Detection in Insurance

Use of Big Data in the Retail Industry

Insurance Company

Improve its ability to make decisions in real time when processing a new claim, thereby reducing the claim cycle time

Incurs a steady increase in the cost of litigation and fraudulent claims

Underwriters do not have required data at the right time to make the necessary decisions, further delaying processing time

BIG DATA

Social MediaData

Note forunderwriter

Social Media Triggers to identify Fraud

These glaring discrepancies reflect FRAUD.

In the claim - a customer might indicate that his or her car was destroyed in a flood

Documentation from the social media feed shows that the car was actually in another city on the day the flood occurred.

Insurance Frauds

Have a huge cost implication on organization

Organizations prefer using Big Data analytics and other advanced technologies

Positive impact on customers as losses are transferred as higher premiums to customers

Big Data analytics platform

Organizations are now able to analyze complex information and accident scenarios in minutes rather than days or months

INSURANCE

Typically use small samples of data to analyze Method relies on the previously recorded fraud cases Every time a fraud based on new technique occurs,

insurance companies have to bear the consequences and the losses for the first time

The traditional method of identifying frauds works in independent silos

It is not capable of handling various sources of information from different channels and different functions in an integrated way

Fraud Detection Methods

Statistical Models

Public

Data

Bank Statements

Legal Judgments

Criminal Records

Medical Bills

Social Network Analysis (SNA)

Big Data can be used to create visibility into blind spots for businesses

SNA is an innovative and effective way to identify and detect frauds

SNA tool uses a mix of analytical methods

• Statistical methods

• Pattern analysis

• Link analysis

When link analysis is used in fraud detection

• Looks for clusters of data • How those data clusters are linked to other

data clusters?• Public records are various data sources that

can be integrated into a model • The insurer can rate claims

When link analysis is used in fraud detection

If the rating is high It indicates that the claim is

fraudulent

• known bad address• a suspicious provider • the vehicle was involved in many accidents with

multiple carriers.

How fast does data arrive? 

How much of unrequired data is there when it arrives?

How deep should the analysis be before determining

the best accurate results?

What type of user interface components need to be included

on the SNA dashboard?

SNA method to detect fraud:Structured and unstructured data, from various sources fed into the ETL (Extract, Transform, and Load) toolThis data is then transformed and loaded into data warehouse

Analytics team uses information from various sources, scores risk of fraud and ranks likelihood of fraudInformation used can come from varied sources - prior belief, previous relationship, number of rejected claims etc.

Big Data technologies - text mining, sentiment analysis, content categorization, and social network analysis included into the fraud detection and predictive modeling mechanism.

SNA method to detect fraud:

Depending on score of particular network, an alert is generated

Investigators can leverage this information and begin researching more on fraudulent claim

Issues of frauds identified are added into case system.

Predictive analysis works with the concept that earlier the fraud detection, the lesser the loss incurred by a business.

Fraud detection

BIG DATA

Text analytics Sentiment analysis

Predictive analytics

Predictive Analytics Technology

Claim adjusters write lengthy reports while investigating a claim. Clues are hidden in reports that claims adjuster would not notice

Computing system based on business rules highlights clues for possible fraud

Fraud detection system spot these discrepancies and flag claim as fraudulent

Customer Relationship

Management (CRM)

The following briefly describes how a Social CRM process works:

Uses organization’s existing CRM to gather data from various social media platforms

Uses “listening” tool to extract data from social chatter that acts as reference data for existing data in organization’s CRM

Reference data along with information stored in CRM fed into a case management system

Case management system analyzes information on basis of organization’s business rules and sends response

Response from claim management system on fraudulent claim is confirmed by investigators

Class 1: Introduction to Big Data

The Significance of Social Network Data

Financial Fraud and Big Data

Fraud Detection in Insurance

Use of Big Data in Retail Industry

Use of Big Data in Retail Industry

BIG DATA

MALL

Use of Big Data in Retail Industry

How many basic tees did we sell today?

What time of the year do we sell most leggings?

What else has customer X bought?

what kind of coupons can we send to customer X?

Use of Big Data in Retail Industry

MALLMALL MALL

MALLMALL MALL

Use of Big Data in Retail Industry

MALL

In-store Sales Online Sales

Use of Big Data in Retail Industry

MALLMALL

Use of Big Data in Retail Industry

Most of the Big Data is just not required

and not useful either

• some information will have long-term strategic

value

• some will be useful only for immediate and tactical

use

• some data won’t be used for anything at all

Use of RFID Data in Retail(Radio Frequency Identification)

A RFID tag refers to a small tag that includes a unique code to identify a product like a UPC code. This tag is placed on shipping pallets or product packages as an adjacent image.

In addition to a bar code, an RFID: 

Specifies pallet as allotted to a precise and exclusive set of computer systems

Helps in finding situations where items have no units left in store

Specifies number of units of each item remaining in store, and thereby raises an alarm when restocking required

Better tracking of products by differentiating products which are out of stock and products that are available on shelf.

Use of RFID Data in Retail

• saves time

• reduces labor

• enhances the visibility of products throughout the production-delivery life cycle

• saves costs

What is the significance of Social Data

Network Data, Financial Fraud, Fraud

Detection in Insurance and the uses of Big

Data in Retail Industry

What are the uses of Big Data in retail

Industry, RFID Data and its advantages

RECAP

BUMPER

BUMPER

Topic 3

Class 1 - Introduction to Big Data

Technologies for Handling Big Data

Distribution & Computing for Big Data

Topic 3 – Technologies for Handling Big Data

Introducing Hadoop

Cloud Computing & In-Memory Technologies for Big Data

DATAPROCESSIN

G

Analysed

Distributed & Parallel Computing

BIG DATA

HADOOPCLOUD

In-Memory Computing

Transmitter

Receiver

Transmitter

Receiver

Hello?

Transmitter

Receiver

Hello?

Transmitter

Receiver

Hello?

I can’t hear you…

Slowdown in system performance

Issues caused by Latency:

Slowdown in system performance

Data management

Issues caused by Latency:

Slowdown in system performance

Data management

Internal organisational communication

Issues caused by Latency:

Slowdown in system performance

Data management

Internal organisational communication

External communication

Issues caused by Latency:

Distributed and Parallel processing

Distributed and Parallel processingtechniques process large amounts of

Distributed and Parallel processingtechniques process large amounts of

data and also deal with latency.

Distributed System

A collection of independent computer systems

Distributed System

A collection of independent computer systems

that are connected via a network

Distributed System

A collection of independent computer systems

that are connected via a network

to accomplish a specific task.

Parallel System

A computer system that has multiple processing units attached to it.

Parallel Computing Techniques

Clusters or Grids

Parallel Computing Techniques

Massively Parallel Processing (MPP)

Parallel Computing Techniques

High-Performance Computing (HPC)

Public Cloud vs Private Cloud

Public Cloud vs Private Cloud

Public Cloud vs Private Cloud

Public Cloud vs Private Cloud

Distribution & Computing for Big Data

Topic 3 – Technologies for Handling Big Data

Introducing Hadoop

Cloud Computing & In-Memory Technologies for Big Data

Features of Hadoop:

• Works on multiple machines without sharing memory

Features of Hadoop:

• Works on multiple machines without sharing memory

• Distributes data over different servers

Features of Hadoop:

• Works on multiple machines without sharing memory

• Distributes data over different servers

• Can track data stored on different servers

Features of Hadoop:

• Works on multiple machines without sharing memory

• Distributes data over different servers

• Can track data stored on different servers

• Runs all available servers in parallel

Features of Hadoop:

• Works on multiple machines without sharing memory

• Distributes data over different servers

• Can track data stored on different servers

• Runs all available servers in parallel

• Keeps multiple copies of data

Hadoop Cluster

Gateway Node

Hadoop Cluster

Gateway Node

Switch

Hadoop Cluster

Gateway Node

Switch

Server 1 Server 2

Hadoop Cluster

Gateway Node

Switch

Server 1 Server 2 Server 3 Server 4 Server 5

Hadoop Cluster

Gateway Node

Switch

Server 1 Server 2 Server 3 Server 4 Server 5

MapReduce

How does Hadoop work?

• Data of an organisation is loaded into the Hadoop software

How does Hadoop work?

• Data of an organisation is loaded into the Hadoop software

• Data is divided into different pieces & sent to different servers

How does Hadoop work?

• Data of an organisation is loaded into the Hadoop software

• Data is divided into different pieces & sent to different servers

• Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data

How does Hadoop work?

• Data of an organisation is loaded into the Hadoop software

• Data is divided into different pieces & sent to different servers

• Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data

• Each server applies the job code to the portion of data stored on it and returns results

Indexing Job

Hadoop Software

Server 1 Server 2 Server 3

Job Code 1 +Processing Data

Job Code 2 +Processing Data

Job Code 3 +Processing Data

Result

EXAMPLE:

user_id user_name

EXAMPLE:

user_id user_name city_name service_provider_na

me and call_time

user_id user_name city_name service_provider_name and call_time

RECAP

Various aspects of distribution and computing for Big Data

Hadoop as a technology for handling Big Data

BUMPER

BUMPER

Topic 3

Class 1 - Introduction to Big Data

Technologies for Handling Big Data

Distribution & Computing for Big Data

Topic 3 – Technologies for Handling Big Data

Introducing Hadoop

Cloud Computing & In-Memory Technologies for Big Data

Features of Cloud Computing:

• Scalability

Features of Cloud Computing:

• Scalability• Elasticity

Features of Cloud Computing:

• Scalability• Elasticity• Resource Pooling

Features of Cloud Computing:

• Scalability• Elasticity• Resource Pooling• Self Service

Features of Cloud Computing:

• Scalability• Elasticity• Resource Pooling• Self Service• Low Costs

Features of Cloud Computing:

• Scalability• Elasticity• Resource Pooling• Self Service• Low Costs• Fault Tolerance

What are Cloud Deployment Modules?

PRIVATE CLOUD

Categories of Cloud Services:

Other Amazon Web Services:

• Amazon Elastic MapReduce

Other Amazon Web Services:

• Amazon Elastic MapReduce• Amazon Dynamo DB

Other Amazon Web Services:

• Amazon Elastic MapReduce• Amazon Dynamo DB• Amazon S3

Other Amazon Web Services:

• Amazon Elastic MapReduce• Amazon Dynamo DB• Amazon S3• Amazon High-Performance Computing

Other Amazon Web Services:

• Amazon Elastic MapReduce• Amazon Dynamo DB• Amazon S3• Amazon High-Performance Computing• Amazon RedShift

Google Web Services:

• Google Compute Engine

Google Web Services:

• Google Compute Engine

• Google Big Query

Google Web Services:

• Google Compute Engine

• Google Big Query

• Google Prediction API

Windows Azure

In-memory technology makes it possible for

In-memory technology makes it possible for

departments or business units

In-memory technology makes it possible for

departments or business units

to take the part of the organizational data

In-memory technology makes it possible for

departments or business units

to take the part of the organizational data

that is relevant to their needs and process it locally.

RECAP

In this session we discussed cloud computing & various in-memory technologies for handling Big Data.

BUMPER

top related