hadoop for beginners free course ppt
DESCRIPTION
This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data. This course is available on hadoop-skills.com for free! This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through: • This course builds Understanding of Big Data problems with easy to understand examples and illustrations. • History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch • What is Hadoop Magic which makes it so unique and powerful. • Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role. • And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them. This course is available for free on hadoop-skills.comTRANSCRIPT
This is a free Course Available on Hadoop-Skills.com
Hadoop For Beginners
Available for free at hadoop-skills.com
This is a free Course Available on Hadoop-Skills.com
Understanding Big Data
Not the usual way
This is a free Course Available on Hadoop-Skills.com
The hype around Big Data
Facebook, Twitter, Google generating petabytes of data everyday.
Hadron Collider project discarding large amount of data as they won’t be able to analyse. Hoping that they haven’t thrown anything valuable.
Interesting facts but …. Why is Big Data important?
Lets understand via an example
This is a free Course Available on Hadoop-Skills.com
Bank Example in 90s
Bank
Optimal Price?
Maximise Profit
Insurance3rd Party Survey Expert Debates
Optimal Price
This is a free Course Available on Hadoop-Skills.com
Bank Example – this Century
Bank
Optimal Price?
Maximise Profit
Insurance
Optimal Price
Data Warehousing
Repository
Web Activity
Transaction
Competitors Pricing
Market Trends
Statistics
Data Warehou
seRun Statistical
Algorithms
Decision SupportSystem
This is a free Course Available on Hadoop-Skills.com
Data Warehouse - Limitations
• Worked on small samples of data.• Looking through key hole and finding the size of the
room.
• High turnaround time for meaningful results• deciding to cross the road based on a picture taken 5
mins ago.
This is a free Course Available on Hadoop-Skills.com
Big Data – Text book definition
“Big data are a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data
processing applications”
-White Tom, Definitive Guide
Volume Velocity Variety
This is a free Course Available on Hadoop-Skills.com
Bank Example
Bank
Optimal Price?
Maximise Profit
Insurance
Optimal Price
Data Warehousing
Repository
Web Activity
Transaction
Competitors Pricing
Market Trends
Statistics
Data Warehou
seRun Statistical
Algorithms
Decision SupportSystem
This is a free Course Available on Hadoop-Skills.com
Future Role of Data
What the Industry is striving for…
This is a free Course Available on Hadoop-Skills.com
Role of Data – Now and in Future
Decision Support System
Digital Nervous System
Data
Fundamental block to
Data
FundamentalBlock to
Business @ speed of thought
Sense
Interpret
Decide
Act
Organisations behaving like Biological nervous system
AvatarSkynet
This is a free Course Available on Hadoop-Skills.com
Example – Digital Nervous System
Bank
Repository
Web Activity
Transaction
Competitors Pricing
Market Trends
Statistics
Optimal Price
Mobile Alert with Travel insurance
This is a free Course Available on Hadoop-Skills.com
Data – e-tsunami
International Data Corporation’s (IDC) 6th annual study:
From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes
More than 5,200 gigabytes for every man, woman, and child in 2020.
From now until 2020, the digital universe will about double every two years.
33% of the digital data might be valuable if analysed, compared with 25% today.
From Gartner:
4.4 Million IT Jobs Globally to Support Big Data By 2015.
This is a free Course Available on Hadoop-Skills.com
What Triggered Big Data Technologies
Knowing Hadoop when it wasn’t hadoop…
This is a free Course Available on Hadoop-Skills.com
History of Hadoop
2003-041996-2000 2005-06 2010 2013
Google File SystemAnd MapReduce Papers
YARN/MapReduce 2/Next Generation Hadoop
Hadoop spawns offNutch
Big Data problem faced byAll Search engines
and Mike
Dreadnaught
Doug Joins Cloudera
0.xx Releases of hadoop
This is a free Course Available on Hadoop-Skills.com
Introduction to Hadoop Magic
What is the new thing that Hadoop brings to computing…
This is a free Course Available on Hadoop-Skills.com
Ox and the load
This is a free Course Available on Hadoop-Skills.com
Distributed ComputingPrice Advantage:
1. Clusters use commodity hardware, cheaper than one expensive server.2. Software License is free.
This is a free Course Available on Hadoop-Skills.com
Hadoop Framework – Brief overview
HDFS
MapReduce
Google File System
Google MapReduce
file1
Name node
Data nodes
map map map map map Reduce
User
This is a free Course Available on Hadoop-Skills.com
The new Fundamentals• Moving the code to data
• Use of Commodity Hardware and Open Source Software against expensive proprietary software on expensive custom Hardware.
• On read schema.
This is a free Course Available on Hadoop-Skills.com
Hadoop Ecosystem
The umbrella of tools around hadoop…
This is a free Course Available on Hadoop-Skills.com
Hadoop Ecosystem
HDFS
MapReduce HBase
Pig Hive
Sqoop/Flume
Log collection
Yahoo Facebook
Storm
Chukwa
Kafka
Structured Stores
Message broker
Oozie
This is a free Course Available on Hadoop-Skills.com
Few interesting Discussions
Hadoop-skills.com bringing hadoop learners together…
This is a free Course Available on Hadoop-Skills.com
Simpler Vs Complex Algorithms
Complex Algorithm on a small dataset
Simple Algorithm on a large dataset
1. Complex Algorithms needs to be correctly sensitive to week correlations.2. Complex Algorithms are thus difficult to code and design.
This is a free Course Available on Hadoop-Skills.com
Data Engineer Vs Data Scientist
Data Engineer Data Scientist
Role
Skills
To solve business problems using data.
To engineer software solutions.
More of programing and technical skills and ability to architect technical solutions.
Strong of Mathematical Skills and understanding of statistical Models.
This is a free Course Available on Hadoop-Skills.com
Hadoop Vendors
-> Skeleton Version
-> All the ecosystems need to be additionally installed.
-> Important ecosystem members included.
-> Few Proprietary tools like Enterprise Manager.
-> Proprietary Hadoop code written in C.
-> Integrated with Hadoop ecosystem members.
-> Based out of Apache hadoop.
-> Supports .NET framework
-> Launches Hadoop Distribution: Pivotal HD
This is a free Course Available on Hadoop-Skills.com
Thank You!!!
This is a free Course Available on Hadoop-Skills.com
I got a lucky chance to meet Doug!!!And explain what little I am doing with hadoop…
Superstar-Doug!!!
A small fan :- Me
And the real Hadoop