introduction to big data analytics on hadoop - springpeople

15
© SpringPeople Software Private Limited, All Rights Reserved. © SpringPeople Software Private Limited, All Rights Reserved. Introduction to Big Data Analytics Hadoop

Upload: springpeople

Post on 02-Jul-2015

1.248 views

Category:

Data & Analytics


2 download

DESCRIPTION

48 hours of video are uploaded to YouTube every minute, resulting in nearly 8 years of content every day. This is where comes the role of Big Data analytics so that huge amount of data can be maintained easily. A brief introduction to Big Data Analytics On Hadoop.

TRANSCRIPT

Page 1: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.© SpringPeople Software Private Limited, All Rights Reserved.

Introduction to Big Data Analytics Hadoop

Page 2: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

What is Big Data?

Big data is a popular term used to describe theexponential growth and availability of data, bothstructured and unstructured. And big data maybe as important to business – and society – asthe Internet has become.

Page 3: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

What Is Hadoop?

It is a free, Java-based programming frameworkthat supports the processing of large data sets ina distributed computing environment. It is partof the Apache project sponsored by the ApacheSoftware Foundation.

Page 4: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

What is HDFS?

• The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications.

• HDFS is like the bucket of the Hadoop system: You dump in your data and it sits there all nice and cozy until you want to do something with it, whether that's running an analysis on it within Hadoop or capturing and exporting a set of data to another tool and performing the analysis there.

Page 5: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Architecture Of HDFS

Page 6: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

About Map Reduce

MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.

The framework is divided into two parts:

• Map, a function that parcels out work to different nodes in the distributed cluster.

• Reduce, another function that collates the work and resolves the results into a single value.

Page 7: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Pig Latin Statement

A Pig Latin statement is a command that producesa Relation. A relation is simply a data bag with a name.That name is called the relation's alias. The simplest PigLatin statement is LOAD, which reads a relation from afile in the file system. Other Pig Latin statementsprocess one or more input relations, and produce anew relation as a result.

Page 8: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Data Preparation & Management

• Types of variables

• Identifying the business Y

• Basic Statistics

• Merging and Appending data – Primary key concept

• Missing values

• Outliers

Page 9: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Data Visualization

• Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly.

• Visualizations help people see things that were not obvious to them before. Even when data volumes are very large, patterns can be spotted quickly and easily.

• Visualizations convey information in a universal manner and make it simple to share ideas with others.

Page 10: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Normal Distribution

• A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme.

• Normal distribution curves are sometimes designed with a histogram inside the curve. The graphs are commonly used in mathematics, statistics and corporate data analytics.

Page 11: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Hypothesis Testing

Hypothesis testing refers to the process of choosingbetween competing hypotheses about a probabilitydistribution, based on observed data from thedistribution.

The two main types of testing :-

• T Test

• Annova

Page 12: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Deductive Vs Inductive Reasoning

• Deductive reasoning happens when a researcher works from the more general information to the more specific. Sometimes this is called the “top-down” approach because the researcher starts at the top with a very broad spectrum of information and they work their way down to a specific conclusion.

• Inductive reasoning works the opposite way, moving from specific observations to broader generalizations and theories. This is sometimes called a “bottom up” approach. The researcher begins with specific observations and measures, begins to then detect patterns and regularities, formulate some tentative hypotheses to explore, and finally ends up developing some general conclusions or theories.

Page 13: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Become Big Data ExpertIn Just 2 days

BigData Analytics on Hadoop will teach you all you need to learn about BigData Analytics on Hadoop.

More Details

Page 14: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Suggested Audience

• Data analysts / Data scientists who want to know how to use their expertise on Big Data

• Database Managers with a knowledge of Hadoop / Java who want to know what to do next in their career and how to manage and draw insights from their data

• Consultants who want to know what Big Data analytics is.

Syllabus

Page 15: Introduction To Big Data Analytics On Hadoop - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

For further info/assistance contact

[email protected]

+91 80 656 79700

www.springpeople.com

Our Partners