big data gaurav
DESCRIPTION
Big Data Class 1TRANSCRIPT
BUMPER
Understanding Big Data
Class 1Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Class 1Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Class 1Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Big Data Management Systems – Databases & Warehouses
Class 1Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Big Data Management Systems – Databases & Warehouses
Analytics & Big Data
Class 1Introduction to Big Data
Topic 1
Class 1 Introduction to Big Data
Understanding Big Data
What is Big Data?
Topic 1 – Understanding Big Data
What is Big Data?
Topic 1 – Understanding Big Data
Structuring & Elements
What is Big Data?
Topic 1 – Understanding Big Data
Structuring & Elements
Application in Business & Careers
DATA
Personal Computers
YouTube
ATMs
Drop Box
Picasa
2002 5 Exabytes Online Data
2009
281 Exabytes Online Data(56 Times Increase)
A pool of large-sized datasets to capture, store,
What is Big Data?
A pool of large-sized datasets to capture, store,
What is Big Data?
search, share, transfer, analyse, and visualise
A pool of large-sized datasets to capture, store,
What is Big Data?
search, share, transfer, analyse, and visualiserelated information or data within an acceptable elapsed time.
Data = Information
Data = InformationInformation = Insight
• Every second, consumers make 10,000 payment card transactions worldwide
• Every second, consumers make 10,000 payment card transactions worldwide
• Every hour, Walmart handles more than 1 million customer transactions
• Every second, consumers make 10,000 payment card transactions worldwide
• Every hour, Walmart handles more than 1 million customer transactions
• Everyday Twitter’s users post 500 million tweets per day
• Every second, consumers make 10,000 payment card transactions worldwide
• Every hour, Walmart handles more than 1 million customer transactions
• Everyday Twitter’s users post 500 million tweets per day
• Facebook users post 2.7 billion likes and comments in a day
BIG DATA
Is a new datachallenge that
requiresleveraging
existingsystems
differently
BIG DATA
Is a new datachallenge that
requiresleveraging
existingsystems
differently
Is classified in terms of:Volume (terabytes, records,
transactions)Variety (internal, external, behavioural, or/and social)Velocity (near or real-time
assimilation)
BIG DATA
Is a new datachallenge that
requiresleveraging
existingsystems
differently
Is classified in terms of:Volume (terabytes, records,
transactions)Variety (internal, external, behavioural, or/and social)Velocity (near or real-time
assimilation)
Is usually unstructured
and qualitative in
Nature
• Understanding target customer
Advantages of Studying Big Data:
• Understanding target customer
• Cutting down expenditures in the healthcare
Advantages of Studying Big Data:
• Understanding target customer
• Cutting down expenditures in the healthcare
• Increase in operating margins in retail
Advantages of Studying Big Data:
• Understanding target customer
• Cutting down expenditures in the healthcare
• Increase in operating margins in retail
• Profits with improvements in operational efficiency
Advantages of Studying Big Data:
• Sports
Industries that Benefit:
• Sports
• Science and Research
Industries that Benefit:
• Sports
• Science and Research
• Security and Law Enforcement
Industries that Benefit:
• Sports
• Science and Research
• Security and Law Enforcement
• Financial Trading
Industries that Benefit:
• Procurement
Departments that can Benefit:
• Procurement• Product Development
Departments that can Benefit:
• Procurement• Product Development• Manufacturing
Departments that can Benefit:
• Procurement• Product Development• Manufacturing• Distribution
Departments that can Benefit:
• Procurement• Product Development• Manufacturing• Distribution• Marketing
Departments that can Benefit:
• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management
Departments that can Benefit:
• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising
Departments that can Benefit:
• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising• Sales
Departments that can Benefit:
• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising• Sales• Store operations
Departments that can Benefit:
• Procurement• Product Development• Manufacturing• Distribution• Marketing• Price Management• Merchandising• Sales• Store operations• Human Resources
Departments that can Benefit:
Flu Indications & WarningsMassive Data Collection
Analyse Collected
Data
Early Warnings for Flu Plague
Social Data from Networking Sites reveals Behavioural Patterns
Use Big Data for Growth & Value Addition
RECAP
What is Big Data, its advantages and various sources
BUMPER
BUMPER
Topic 1
Class 1 - Introduction to Big Data
Understanding Big Data
What is Big Data?
Class 1 - Introduction to Big Data
What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements
What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements
Application in Business & Careers
How do I choose a book, of the millions available on my favorite sites or stores?
How can I use the vast amount of data and information I come across?
How do I keep myself updated of events, news?
Which news articles should I read?
How do I choose a book, of the millions available on my favorite sites or stores?
How can I use the vast amount of data and information I come across?
Formats of Data:
Formats of Data:
Formats of Data:
Formats of Data:
Internal – Organisational or enterprise data
Sources of Data:
External - Social Data from the internet or Government
Structured Data
Unstructured Data
Semi-Structure
d Data
BIG DATA
Structured Data
• Has a predefined format
Features of Structured Data:
• Has a predefined format
• Resides in fixed fields within a record
Features of Structured Data:
• Has a predefined format
• Resides in fixed fields within a record
• Has their attributes mapped
Features of Structured Data:
• Has a predefined format
• Resides in fixed fields within a record
• Has their attributes mapped
• Used to report against predetermined data types
Features of Structured Data:
Sources of Structured Data:
• Relational databases
Sources of Structured Data:
• Relational databases
• Flat files in record format
Sources of Structured Data:
• Relational databases
• Flat files in record format
• Multidimensional databases
Sources of Structured Data:
• Relational databases
• Flat files in record format
• Multidimensional databases
• Legacy databases
Unstructured Data
Sources of Unstructured Data:
• Organisational Data
Sources of Unstructured Data:
• Organisational Data
• Social Media
Sources of Unstructured Data:
• Organisational Data
• Social Media
• Mobile Data
Challenges of Using Unstructured Data:
• Difficulty and time consumption in making sense
Challenges of Using Unstructured Data:
• Difficulty and time consumption in making sense
• Difficulty in combining and linking unstructured data to more structured information
Challenges of Using Unstructured Data:
• Difficulty and time consumption in making sense
• Difficulty in combining and linking unstructured data to more structured information
• Cost-addition in terms of the storage wastage and human resource needed
Semi-Structured Data
Sources of Semi-Structured data:
• Database systems
Sources of Semi-Structured data:
• Database systems
• File systems like Web data and bibliographic data
Sources of Semi-Structured data:
• Database systems
• File systems like Web data and bibliographic data
• Data exchange formats like scientific data
Sl. No Name E-mail
1. Sam Jacobs [email protected]
2. First Name David [email protected]
Last Name Brown
Volume
Velocity
Variety
What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements
Application in Business & Careers
Big Data Application In Business Analytics
What are the areas where Big Data can be applied?
Transportation
Provides improved traffic information and autonomous features
Education
Through innovative approaches for teachers to analyze students
Travel
Apply analytics to pricing, inventory, and advertising to improve customer experiences
Governments
To make informed decisions for fraud management, discover unknown threats, ensure security of global supply chain
Healthcare
To ensure clinical protocols that will ensure the best health outcome for patients
Careers in Big
Data
BIG Career Opportunities
Major Big Data Hiring Companies:
Product companies, e.g., Oracle
Technology drivers, e.g., Google
Services companies, e.g., EMC
Data analytics companies, e.g., Splunk
The most common job titles in Big Data include:
Big Data Analyst
The most common job titles in Big Data include:
Big Data Analyst Big Data Scientist
The most common job titles in Big Data include:
Big Data Analyst Big Data Scientist
Big Data Developer
Module 1Introduction to Big Data
Module 1Introduction to Big Data
Big Data AnalystCertification Track
Big Data DeveloperCertification Track
Module 1Introduction to Big Data
Big Data AnalystCertification Track
Big Data DeveloperCertification Track
Module 2Introduction to Analytics & R Programming
Module 3Data Analysis
Using R
Module 4Advanced Analytics Using R
Module 2Managing a
Big Data Ecosystem
Module 1Introduction to Big Data
Big Data AnalystCertification Track
Big Data DeveloperCertification Track
Module 2Introduction to Analytics & R Programming
Module 3Data Analysis
Using R
Module 4Advanced Analytics Using R
Module 2Managing a
Big Data Ecosystem
Module 5Machine Learning Concepts
Module 3Storing &
Processing Data: HDFS & MapReduce
Module 4: Increasing
Efficiency with Hadoop Tools
Module 5Additional
Hadoop Tools: ZooKeeper,
Sqoop, Flume,
YARN & Storm
Module 1Introduction to Big Data
Big Data AnalystCertification Track
Big Data DeveloperCertification Track
Module 2Introduction to Analytics & R Programming
Module 3Data Analysis
Using R
Module 4Advanced Analytics Using R
Module 2Managing a
Big Data Ecosystem
Module 5Machine Learning Concepts
Module 3Storing &
Processing Data: HDFS & MapReduce
Module 4: Increasing
Efficiency with Hadoop Tools
Module 5Additional
Hadoop Tools: ZooKeeper,
Sqoop, Flume,
YARN & Storm Module 6
Social Media, Mobile
Analytics & Visualisation
Module 7 Industry
Applications of Big Data
Applications
Module 6Leveraging NoSQL
& Hadoop: Real Time, Security &
Cloud
Module 7Commercial
Hadoop Distribution &
Management Tools
Module 1Introduction to Big Data
Big Data AnalystCertification Track
Big Data DeveloperCertification Track
Module 2Introduction to Analytics & R Programming
Module 3Data Analysis
Using R
Module 4Advanced Analytics Using R
Module 2Managing a
Big Data Ecosystem
Module 5Machine Learning Concepts
Module 3Storing &
Processing Data: HDFS & MapReduce
Module 4: Increasing
Efficiency with Hadoop Tools
Module 5Additional
Hadoop Tools: ZooKeeper,
Sqoop, Flume,
YARN & Storm Module 6
Social Media, Mobile
Analytics & Visualisation
Module 7 Industry
Applications of Big Data
Applications
Module 6Leveraging NoSQL
& Hadoop: Real Time, Security &
Cloud
Module 7Commercial
Hadoop Distribution &
Management ToolsComplete
Project
Wrox Certified Big Data Analyst/ Developer
Technical Skills Required for a Big Data Analyst:
Technical Skills Required for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
Technical Skills Required for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
Technical Skills Required for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
Technical Skills Required for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
Technical Skills Required for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
• Statistical techniques to implement text analytics solutions
Technical Skills Required for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
• Statistical techniques to implement text analytics solutions
• Data handling and manipulation techniques
Technical Skills Required for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
• Statistical techniques to implement text analytics solutions
• Data handling and manipulation techniques
• Generate client ready dashboards, reports and visualizations
Soft Skills Required:
• Strong written & verbal communication skills
Soft Skills Required:
• Strong written & verbal communication skills
• Analytical Ability
Soft Skills Required:
• Strong written & verbal communication skills
• Analytical Ability
• Basic understanding of how a business works
Future of Big Data
RECAP
What are the various types and structures of Big Data and the elements that form it
What are the business applications of Big Data and the career opportunities associated
BUMPER
BUMPER
BIG DATA
Topic 2Business Applications of Big Data
Class 1: Introduction to Big Data
Social Media
Topic 2Business Applications of Big Data
Significance of Social Network Data
Topic 2Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Topic 2Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance
Topic 2Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance
Use in Retail Industry
Significance of Social Network Data
What is Social Network Data?
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network Data Analysis?
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network Data Analysis?
What is Sentiment Analysis?
DATA
Social Media
AGE
Social Media
AGE
GENDER
Social Media
AGE
GENDER
LOCATION
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network Data Analysis?
What is Sentiment Analysis?
Social Network Analysis (SNA)
SocialNetwork
Social Network Analysis (SNA)
SocialNetwork
DATA
Analysis
Social Network Analysis (SNA)
SocialNetwork
DATA
Total Number of calls
Total Number of calls
Total Number of SMS
Structure of a Caller’s Social Network
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Networking Analysis a Big Data Problem
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network Data Analysis?
What is Sentiment Analysis?
Social Network Analysis (SNA)
Business Intelligence
Social Network Analysis (SNA)
Business Intelligence
Marketing
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
Customer Relationship Management (CRM)
A• E• F
B• A• D
C• H• OGroup
AGroup GH
Provides new contexts in which decisions are data driven, not opinion driven
Social Network Data Analysis
Provides new contexts in which decisions are data driven, not opinion driven
Organizations to shift goals to maximize profitability of customer’s network
Social Network Data Analysis
Provides new contexts in which decisions are data driven, not opinion driven
Organizations to shift goals to maximize profitability of customer’s network
Organizations to identify highly connected customers
Social Network Data Analysis
Organizations to lure highly connected customers with free trials and solicit their feedback
Social Network Data Analysis
Organizations to lure highly connected customers with free trials and solicit their feedback
Organizations to encourage internal customers to become more active
Social Network Data Analysis
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
Social Data
Social Data
Analysis
Analyze Media Communication
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
System
System
DATA
System
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network Data Analysis?
What is Sentiment Analysis?
Product Development and Offerings
Sentiment Analysis
Marketers Business Professionals
Followers
3,46,259 Followers
2,73,591Likes
But is one of the most disliked airlines. Why?
SummaryRECAP
What is social network data and analysisWhat are its uses and values
BUMPER
BUMPER
BIG DATA
Topic 2Business Applications of Big Data
Class 1: Introduction to Big Data
Topic 2Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance
Use in Retail Industry
BANK
Common Financial Frauds Common Financial Frauds
Credit Card Frauds
Exchange or Return Policy Fraud
Personal Information Fraud
understand customers ordering
patterns
Prevent Frauds
watch outFor red flags
Big Data
Fraud
Analyzing data
sample size Small
Can understand various patterns of the fraud
Analyzing data
sample size Large
Cannot understand various patterns of the fraud
• Size could not be increased, required huge investments in time and money
• Big Data techniques can overcome this challenge
Big Data analytics can…
Run check on all data to identify fraudulent ones
Identify new ways of fraud and add to a set of fraud-prevention checks
Doesn’t impede customers with unnecessary polices and governance structures
Fraud Detection in Real Time
BIG DATA
live transactions
sources of data
BIG DATA
Historical Data Indicate fraud patterns
Checks to prevent real-
time fraud
Real-time Analysis
BIG DATA
Create comparisons
Drawing Maps & Graphs
Decisions and effective systems
BLOCK FRAUD
Topic 2Business Applications of Big Data
The Significance of Social Network Data
Financial Fraud and Big Data
Fraud Detection in Insurance
Use of Big Data in the Retail Industry
Insurance Company
Improve its ability to make decisions in real time when processing a new claim, thereby reducing the claim cycle time
Incurs a steady increase in the cost of litigation and fraudulent claims
Underwriters do not have required data at the right time to make the necessary decisions, further delaying processing time
BIG DATA
Social MediaData
Note forunderwriter
Social Media Triggers to identify Fraud
These glaring discrepancies reflect FRAUD.
In the claim - a customer might indicate that his or her car was destroyed in a flood
Documentation from the social media feed shows that the car was actually in another city on the day the flood occurred.
Insurance Frauds
Have a huge cost implication on organization
Organizations prefer using Big Data analytics and other advanced technologies
Positive impact on customers as losses are transferred as higher premiums to customers
Big Data analytics platform
Organizations are now able to analyze complex information and accident scenarios in minutes rather than days or months
INSURANCE
Typically use small samples of data to analyze Method relies on the previously recorded fraud cases Every time a fraud based on new technique occurs,
insurance companies have to bear the consequences and the losses for the first time
The traditional method of identifying frauds works in independent silos
It is not capable of handling various sources of information from different channels and different functions in an integrated way
Fraud Detection Methods
Statistical Models
Public
Data
Bank Statements
Legal Judgments
Criminal Records
Medical Bills
Social Network Analysis (SNA)
Big Data can be used to create visibility into blind spots for businesses
SNA is an innovative and effective way to identify and detect frauds
SNA tool uses a mix of analytical methods
• Statistical methods
• Pattern analysis
• Link analysis
When link analysis is used in fraud detection
• Looks for clusters of data • How those data clusters are linked to other
data clusters?• Public records are various data sources that
can be integrated into a model • The insurer can rate claims
When link analysis is used in fraud detection
If the rating is high It indicates that the claim is
fraudulent
• known bad address• a suspicious provider • the vehicle was involved in many accidents with
multiple carriers.
How fast does data arrive?
How much of unrequired data is there when it arrives?
How deep should the analysis be before determining
the best accurate results?
What type of user interface components need to be included
on the SNA dashboard?
SNA method to detect fraud:Structured and unstructured data, from various sources fed into the ETL (Extract, Transform, and Load) toolThis data is then transformed and loaded into data warehouse
Analytics team uses information from various sources, scores risk of fraud and ranks likelihood of fraudInformation used can come from varied sources - prior belief, previous relationship, number of rejected claims etc.
Big Data technologies - text mining, sentiment analysis, content categorization, and social network analysis included into the fraud detection and predictive modeling mechanism.
SNA method to detect fraud:
Depending on score of particular network, an alert is generated
Investigators can leverage this information and begin researching more on fraudulent claim
Issues of frauds identified are added into case system.
Predictive analysis works with the concept that earlier the fraud detection, the lesser the loss incurred by a business.
Fraud detection
BIG DATA
Text analytics Sentiment analysis
Predictive analytics
Predictive Analytics Technology
Claim adjusters write lengthy reports while investigating a claim. Clues are hidden in reports that claims adjuster would not notice
Computing system based on business rules highlights clues for possible fraud
Fraud detection system spot these discrepancies and flag claim as fraudulent
Customer Relationship
Management (CRM)
The following briefly describes how a Social CRM process works:
Uses organization’s existing CRM to gather data from various social media platforms
Uses “listening” tool to extract data from social chatter that acts as reference data for existing data in organization’s CRM
Reference data along with information stored in CRM fed into a case management system
Case management system analyzes information on basis of organization’s business rules and sends response
Response from claim management system on fraudulent claim is confirmed by investigators
Class 1: Introduction to Big Data
The Significance of Social Network Data
Financial Fraud and Big Data
Fraud Detection in Insurance
Use of Big Data in Retail Industry
Use of Big Data in Retail Industry
BIG DATA
MALL
Use of Big Data in Retail Industry
How many basic tees did we sell today?
What time of the year do we sell most leggings?
What else has customer X bought?
what kind of coupons can we send to customer X?
Use of Big Data in Retail Industry
MALLMALL MALL
MALLMALL MALL
Use of Big Data in Retail Industry
MALL
In-store Sales Online Sales
Use of Big Data in Retail Industry
MALLMALL
Use of Big Data in Retail Industry
Most of the Big Data is just not required
and not useful either
• some information will have long-term strategic
value
• some will be useful only for immediate and tactical
use
• some data won’t be used for anything at all
Use of RFID Data in Retail(Radio Frequency Identification)
A RFID tag refers to a small tag that includes a unique code to identify a product like a UPC code. This tag is placed on shipping pallets or product packages as an adjacent image.
In addition to a bar code, an RFID:
Specifies pallet as allotted to a precise and exclusive set of computer systems
Helps in finding situations where items have no units left in store
Specifies number of units of each item remaining in store, and thereby raises an alarm when restocking required
Better tracking of products by differentiating products which are out of stock and products that are available on shelf.
Use of RFID Data in Retail
• saves time
• reduces labor
• enhances the visibility of products throughout the production-delivery life cycle
• saves costs
What is the significance of Social Data
Network Data, Financial Fraud, Fraud
Detection in Insurance and the uses of Big
Data in Retail Industry
What are the uses of Big Data in retail
Industry, RFID Data and its advantages
RECAP
BUMPER
BUMPER
Topic 3
Class 1 - Introduction to Big Data
Technologies for Handling Big Data
Distribution & Computing for Big Data
Topic 3 – Technologies for Handling Big Data
Introducing Hadoop
Cloud Computing & In-Memory Technologies for Big Data
DATAPROCESSIN
G
Analysed
Distributed & Parallel Computing
BIG DATA
HADOOPCLOUD
In-Memory Computing
Transmitter
Receiver
Transmitter
Receiver
Hello?
Transmitter
Receiver
Hello?
Transmitter
Receiver
Hello?
I can’t hear you…
Slowdown in system performance
Issues caused by Latency:
Slowdown in system performance
Data management
Issues caused by Latency:
Slowdown in system performance
Data management
Internal organisational communication
Issues caused by Latency:
Slowdown in system performance
Data management
Internal organisational communication
External communication
Issues caused by Latency:
Distributed and Parallel processing
Distributed and Parallel processingtechniques process large amounts of
Distributed and Parallel processingtechniques process large amounts of
data and also deal with latency.
Distributed System
A collection of independent computer systems
Distributed System
A collection of independent computer systems
that are connected via a network
Distributed System
A collection of independent computer systems
that are connected via a network
to accomplish a specific task.
Parallel System
A computer system that has multiple processing units attached to it.
Parallel Computing Techniques
Clusters or Grids
Parallel Computing Techniques
Massively Parallel Processing (MPP)
Parallel Computing Techniques
High-Performance Computing (HPC)
Public Cloud vs Private Cloud
Public Cloud vs Private Cloud
Public Cloud vs Private Cloud
Public Cloud vs Private Cloud
Distribution & Computing for Big Data
Topic 3 – Technologies for Handling Big Data
Introducing Hadoop
Cloud Computing & In-Memory Technologies for Big Data
Features of Hadoop:
• Works on multiple machines without sharing memory
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
• Can track data stored on different servers
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
• Can track data stored on different servers
• Runs all available servers in parallel
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
• Can track data stored on different servers
• Runs all available servers in parallel
• Keeps multiple copies of data
Hadoop Cluster
Gateway Node
Hadoop Cluster
Gateway Node
Switch
Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2
Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2 Server 3 Server 4 Server 5
Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2 Server 3 Server 4 Server 5
MapReduce
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
• Data is divided into different pieces & sent to different servers
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
• Data is divided into different pieces & sent to different servers
• Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
• Data is divided into different pieces & sent to different servers
• Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data
• Each server applies the job code to the portion of data stored on it and returns results
Indexing Job
Hadoop Software
Server 1 Server 2 Server 3
Job Code 1 +Processing Data
Job Code 2 +Processing Data
Job Code 3 +Processing Data
Result
EXAMPLE:
user_id user_name
EXAMPLE:
user_id user_name city_name service_provider_na
me and call_time
user_id user_name city_name service_provider_name and call_time
RECAP
Various aspects of distribution and computing for Big Data
Hadoop as a technology for handling Big Data
BUMPER
BUMPER
Topic 3
Class 1 - Introduction to Big Data
Technologies for Handling Big Data
Distribution & Computing for Big Data
Topic 3 – Technologies for Handling Big Data
Introducing Hadoop
Cloud Computing & In-Memory Technologies for Big Data
Features of Cloud Computing:
• Scalability
Features of Cloud Computing:
• Scalability• Elasticity
Features of Cloud Computing:
• Scalability• Elasticity• Resource Pooling
Features of Cloud Computing:
• Scalability• Elasticity• Resource Pooling• Self Service
Features of Cloud Computing:
• Scalability• Elasticity• Resource Pooling• Self Service• Low Costs
Features of Cloud Computing:
• Scalability• Elasticity• Resource Pooling• Self Service• Low Costs• Fault Tolerance
What are Cloud Deployment Modules?
PRIVATE CLOUD
Categories of Cloud Services:
Other Amazon Web Services:
• Amazon Elastic MapReduce
Other Amazon Web Services:
• Amazon Elastic MapReduce• Amazon Dynamo DB
Other Amazon Web Services:
• Amazon Elastic MapReduce• Amazon Dynamo DB• Amazon S3
Other Amazon Web Services:
• Amazon Elastic MapReduce• Amazon Dynamo DB• Amazon S3• Amazon High-Performance Computing
Other Amazon Web Services:
• Amazon Elastic MapReduce• Amazon Dynamo DB• Amazon S3• Amazon High-Performance Computing• Amazon RedShift
Google Web Services:
• Google Compute Engine
Google Web Services:
• Google Compute Engine
• Google Big Query
Google Web Services:
• Google Compute Engine
• Google Big Query
• Google Prediction API
Windows Azure
In-memory technology makes it possible for
In-memory technology makes it possible for
departments or business units
In-memory technology makes it possible for
departments or business units
to take the part of the organizational data
In-memory technology makes it possible for
departments or business units
to take the part of the organizational data
that is relevant to their needs and process it locally.
RECAP
In this session we discussed cloud computing & various in-memory technologies for handling Big Data.
BUMPER