hadoop,big data analytics and more

32
Trendwise Analytics Copyright Trendwise Analytics Bangalore Hadoop Meetup – 6 3rd August 2013 Video

Upload: trendwise-analytics

Post on 27-Jan-2015

109 views

Category:

Technology


1 download

DESCRIPTION

A discussion on all things Big Data, in a single presentation.

TRANSCRIPT

Page 1: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

Bangalore Hadoop Meetup – 6 3rd August 2013

Video

Page 2: Hadoop,Big Data Analytics and More

Trendwise Analytics

Agenda

Introduction to Big Data

Page 3: Hadoop,Big Data Analytics and More

Trendwise Analytics

Market opportunity

IDC, a research firm, predicts that the market for Big Data technology and services will reach $16.9 billion by 2015, up from $3.2 billion in 2010. That is a 40 percent-a-year growth rate — about seven times the estimated growth rate for the overall information technology and communications business, according to IDC.

Billions and billions: big data becomes a big deal :

Deloitte predicts that in 2012, “big data” will likely experience accelerating growth and market penetration.

Page 4: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

Forecast growth of Hadoop Job Market

Source: Indeed -- http://www.indeed.com/jobtrends/Hadoop.html

Page 5: Hadoop,Big Data Analytics and More

Trendwise Analytics

What is Big Data

Page 6: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

Internet Minute ....

Page 7: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

Other sources

• A commercial aircraft generates 3GB of flight sensor data in 1 hour

An ERP system for an mid size company grows by 1-2TB annually

A Video Suveillance Camera generates 1-3TB data in 3 months

Airtel or Vodafone generates 3TB of Call Details Records (CDR) every day

Every day 2.5 quintillion (2.5×10^18) bytes of data is created

i.e., 2,500,000TB

Page 8: Hadoop,Big Data Analytics and More

Trendwise Analytics

How are Companies using Big Data Technology?

Page 9: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics917 Aug 2013

Watson wins Jeopardy!

Feb 14th 2011 – Watson wins Jeopardy! beating its human opponents.

Watson is IBM’s super computer built using Big Data Technology.

Page 10: Hadoop,Big Data Analytics and More

Trendwise Analytics

Across All Industries

Web app optimization

Smart meter monitoring

Equipment monitoring

Advertising analysis

Life sciences research

Fraud detection

Healthcare outcomes

Weather forecasting

Natural resource exploration

Social network analysis

Churn analysis

Traffic flow optimization

IT infrastructure optimization

Legal discovery

Page 11: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

What are the types of business problems?

Source: Cloudera “Ten Common Hadoopable Problems”

Page 12: Hadoop,Big Data Analytics and More

Trendwise Analytics

Companies using Big Data

12

New $1B corporate center for software and analytics

Hiring 400 data scientists

Includes financial and marketing applications,

but with special focus on industrial uses of big data

When will this gas turbine need maintenance?

Ford collects and aggregates data from the 4 million vehicles that use in-car sensing and remote app management software

The data allows to glean information on a range of issues, from how drivers are using their vehicles, to the driving environment that could help them improve the quality of the vehicle

Partnered with Microsoft to develop SYNC

Amazon has been collecting customer information for years--not just addresses and payment information but the identity of everything that a customer had ever bought or even looked at.

They’re using that data to build customer relationship

AT&T has 300 million customers

A team of researchers is working to turn data collected through the company’s cellular network into a trove of information for policymakers, urban planners and traffic engineers.

The researchers want to see how the city changes hourly by looking at calls and text messages relayed through cell towers around the region, noting that certain towers see more activity at different times

Page 13: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

TECHNOLOGY

Hadoop Non-Hadoop

Page 14: Hadoop,Big Data Analytics and More

Trendwise Analytics

What is Hadoop?Open source project started by Doug Cutting

A platform to manage Big Data

Helps in Distributed computing

Runs on Commodity Hardware

Page 15: Hadoop,Big Data Analytics and More

Trendwise Analytics

Hadoop is a set of Apache Frameworks and more…

Data storage (HDFS) Runs on commodity hardware (usually Linux) Horizontally scalable

Processing (MapReduce) Parallelized (scalable) processing Fault Tolerant

Other Tools / Frameworks Data Access

HBase, Hive, Pig, Mahout Tools

Hue, Sqoop Monitoring

Greenplum, Cloudera

Hadoop Core - HDFS

MapReduce API

Data Access

Tools & Libraries

Monitoring & Alerting

Page 16: Hadoop,Big Data Analytics and More

Trendwise Analytics

What are the core parts of a Hadoop distribution?

Page 17: Hadoop,Big Data Analytics and More

A View of Hadoop (from Hortonworks)

Source: “Intro to Map Reduce” -- http://www.youtube.com/watch?v=ht3dNvdNDzI

Page 18: Hadoop,Big Data Analytics and More

Hadoop Cluster HDFS (Physical) Storage

Page 19: Hadoop,Big Data Analytics and More

Trendwise Analytics

MapReduce Job – Logical View

Image from - http://mm-tom.s3.amazonaws.com/blog/MapReduce.png

Page 20: Hadoop,Big Data Analytics and More

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

MapReduce Example - WordCount

Page 21: Hadoop,Big Data Analytics and More

Ways to MapReduce

Libraries Languages

Note: Java is most common, but other languages can be used

Page 22: Hadoop,Big Data Analytics and More

Hadoop Ecosystem

Page 23: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

Other components

Hive• Data Warehouse infrastructure

that provides data summarization and ad hoc querying on top of Hadoop

PIG• A high-level data-flow language and

execution framework for parallel computation

Sqoop• Sqoop is a tool designed to

help users of large data import existing relational databases into their Hadoop clusters

Zookeeper• Zookeeper is a centralized

service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

Page 24: Hadoop,Big Data Analytics and More

Trendwise Analytics

Benefits of Hadoop

• Hadoop is designed to run on cheap commodity hardware

• It automatically handles data replication and node failure

• Handles large volumes of unstructured data easily• Last but not least – its free! ( Open source)

Page 25: Hadoop,Big Data Analytics and More

Trendwise Analytics

Copyright Trendwise Analytics

Commercial Hadoop Distributions

• Cloudera• Hortonworks• Greenplum, A Division of EMC• IBM InfoSphere BigInsights

Page 26: Hadoop,Big Data Analytics and More

Trendwise Analytics

How to get started?

1. http://hadoop.apache.org/

2. Pre-requisite:

- Linux ( Preferred OS) - Java JDK 3. Install and run a single node cluster

4. Follow the instructions on Mukul's Blog.

Page 27: Hadoop,Big Data Analytics and More

Trendwise Analytics

How it looks?

Page 28: Hadoop,Big Data Analytics and More

Trendwise Analytics

Web Interface – NameNode Tracker

Page 29: Hadoop,Big Data Analytics and More

Trendwise Analytics

Job Tracker

Page 30: Hadoop,Big Data Analytics and More

Trendwise Analytics

Task Tracker

Page 31: Hadoop,Big Data Analytics and More

Trendwise Analytics

Technology – Non Hadoop

•HPCC - HPCC Systems from LexisNexis Risk Solutions offers a proven, • open-source, data-intensive supercomputing platform

• designed for the enterprise to solve big data problems.

• SAP HANA is SAP AG’s implementation of in-memory database technology.

•No-SQL Databases – Cassandra, CouchDB, Redis

Page 32: Hadoop,Big Data Analytics and More

Trendwise Analytics

Contact us

S.Mohan Kumar

Co-Founder and CEO

Trendwise Analytics

Website: www.TrendwiseAnalytics.com

Email: [email protected]

US Tollfree: +1 877 268 2872

India Number: +91 80 4094 9600

Copyright Trendwise Analytics