big data seminar (tict(cse)batch--> 2013-2017)
TRANSCRIPT
![Page 1: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/1.jpg)
![Page 2: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/2.jpg)
WHAT IS
Data is raw, unorganized facts that need to be processed. Data can be something simple,
seemingly random and of itself worthless useless until it is organized.
![Page 3: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/3.jpg)
DIFFERENT TYPES OF DATA
Traditional RDBMS deals with only Structured Data
Need of a Technology which deals with Semi – Structured Data ,Unstructured
Data and Structured Data as well
Semi-Structured Data
![Page 4: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/4.jpg)
Traditional Concept of Data Storage
Organizations
Banking Sector
Stock Exchange
Hospital
Social Media
Online Shopping
Others
Extract Data Transform Data Load into DataBase
End Users Generate Reports & Perform
Analytics
Managing Data
Processing Data
Data GrowsDifficult
![Page 5: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/5.jpg)
Drawback of Using Traditional Approach
Expensive Time Consuming Scalability
Storage Size Resource Failure
![Page 6: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/6.jpg)
The Model of Generating or Consuming Data Has Change...
OLD MODEL - Few companies are generating the data, all other consuming the data.
NEW MODEL - All of us generating the data, and all of us consuming the data.
![Page 7: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/7.jpg)
BIG DATA
![Page 8: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/8.jpg)
WHAT IS
Big data means really a Big Data, it is a collection of large datasets that cannot be
processed using traditional computing techniques. It requires new architecture , new techniques , various tools and frameworks .
![Page 9: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/9.jpg)
Definition of BIG DATA
![Page 10: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/10.jpg)
Different Sources of DATA
SOURCES
![Page 11: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/11.jpg)
WHERE THE BIG DATA IS USED
IT Industries
Manufacturing Industries
Telecommunications
Banking sector
Healthcare
![Page 12: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/12.jpg)
CHALLENGES IN HANDLING BIG DATA
There are two main challenges in handle BIG DATA1. How do we store and manage such a huge volume of
DATA, efficiently.2. How do we process & extract valuable information
from the huge volume of DATA within a given time frame.
![Page 13: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/13.jpg)
BRIEF HISTORY OF HADOOP
![Page 14: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/14.jpg)
WHAT IS
Hadoop is a open Source Framework. It is designed to store and Process huge volume of Data, efficiently.
Hadoop is a platform that provides both distributed storage and computational capabilities.
![Page 15: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/15.jpg)
Why HADOOP Is Used
![Page 16: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/16.jpg)
MAJOR COMPONENT S OF HADOOP ECOSYSTEM
HADOOP COMPONENTS
HADOOP DISTRIBUTED FILE SYSTEM
Google MAPREDUCE ALGORITHM
Storage Processing
![Page 17: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/17.jpg)
HADOOP ECOSYSTEM
Flume Sqoop
Semi-Structured or Unstructured Data Structured Data
Import or Export
![Page 18: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/18.jpg)
Features of HadoopCost Effective System (Use Commodity Machine)
Large Cluster of Nodes (Processing Power & Storage Capacity is Increase)
![Page 19: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/19.jpg)
Features of HadoopParallel Processing (Less Time is Required to Store &
Access the Data)
Distributed Data (Data is Distributed in Different Nodes)
![Page 20: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/20.jpg)
Features of HadoopAutomatic Failover Management
Heterogeneous Cluster
![Page 21: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/21.jpg)
Features of Hadoop
Scalability
![Page 22: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/22.jpg)
How The Data Is Stored In Hadoop Clusters
Rack 1 Rack 2Node 1 Node 4Node 3Node 2
![Page 23: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/23.jpg)
Hadoop Distributed File SystemName Node
Task Tracker
Client
Block5 Block2 Block4
Block3 Block1 Block6
Data Node
Block4 Block1 Block3
Block2 Block6 Block5
Block1 Block2 Block3
Block4 Block5 Block6
Data Node Data Node
Task Tracker Task Tracker
Job Tracker
Secondary Name Node
![Page 24: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/24.jpg)
MapReduce Flow
![Page 25: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/25.jpg)
MapReduce FrameworkMap Reduce works by breaking the processing into two phases
Map Phase & Reduce Phase
Input Split Map Reduce OutputShuffle & Sort
![Page 26: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/26.jpg)
Disadvantages
Security Concerns
Not Fit For Small Data
![Page 27: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/27.jpg)
Future Scope of Big Data & Hadoop
![Page 28: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/28.jpg)
Conclusion...
![Page 29: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/29.jpg)
Source of Information
![Page 30: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/30.jpg)
Presented ByVishal Kumar
Sk Ibrahim AnamSouvik Jana
![Page 31: Big data seminar (TICT(CSE)BATCH--> 2013-2017)](https://reader035.vdocuments.us/reader035/viewer/2022062823/5879939c1a28ab95318b60b1/html5/thumbnails/31.jpg)
Thank You