big data with hadoop and cloud computing

11
http://clean-clouds.com Big Data with Hadoop and Cloud Computing

Upload: mitesh-soni

Post on 14-Jan-2015

1.647 views

Category:

Technology


4 download

DESCRIPTION

Big Data with Hadoop and Cloud Computing

TRANSCRIPT

Page 1: Big Data with Hadoop and Cloud Computing

http://clean-clouds.com

Big Data with Hadoop and Cloud Computing

Page 2: Big Data with Hadoop and Cloud Computing

Researcher’s Blog - http://clean-clouds.com

“Big Data Processing” relevant for Enterprises

• Big Data used to be discarded or un-analyzed & archived.

– Loss of information, insight, and prospects to extract new value.

• How Big Data is beneficial?

– Energy companies - Geophysical analysis.

– Science and medicine - Empiricism is growing than experimentation

– Disney – Customer behavior patterns across its stores, and theme parks

• Pursuit of a “Competitive Advantage” is the driving factor for Enterprises

– Data mining (Log processing, click stream analysis, similarity algorithms, etc.), Financial

simulation (Monte Carlo simulation), File processing (resize jpegs), Web indexing

Page 3: Big Data with Hadoop and Cloud Computing

Researcher’s Blog - http://clean-clouds.com

Cloud Computing ~ brings economy to Big Data Processing

• Big Data Processing can be implemented by HPC & Cloud.

1) HPC implementation is very costly w.r.t. CAPEX & OPEX.

2) Cloud Computing is efficient because of its paper use nature.

• MapReduce programming model is used for processing big data sets.

• Pig, Hive, Hadoop, … are used for Big data Processing

– Pig - SQL-like operations that apply to datasets.,

– Hive - Perform SQL-like data analysis on data

– Hadoop - processes vast amounts of data; (Focal point)

• Use EC2 instances to analyze “Big Data” in Amazon IaaS.

• Amazon MapReduce reduces complex setup & Magt.

Page 4: Big Data with Hadoop and Cloud Computing

Researcher’s Blog - http://clean-clouds.com

Cost Comparison of Alternatives

Elastic: 1000 Standard ExtraLarge instances 15GB RAM,1690GB storageElastic MapReduce $377395

$1,746,769Elastic, Easy to Use, ReliableAuto turnoff resources.

Amazon MapReduce

As per Amazon EC2 cost comparison calculator

Use case: Analyze Next Generation Sequencing data to understand genetics of cancer.

100 Steady & 200 Peak load Servers68.4GB memory 1690 GB storage

•CAPEX & OPEX•Time-consuming set-up•Magt. of Hadoop clusters

HPC

400 reserved,600 on demandStandard Extra Largeinstances 15GB RAM,1690GB storage

•Time-consuming set-up•Magt. of Hadoop clusters

Amazon IaaS

Page 5: Big Data with Hadoop and Cloud Computing

Researcher’s Blog - http://clean-clouds.com

Future Direction• Current Experiments & Identified areas

– Social network analysis– Managing Data center– Collective Intelligence - Algorithms and Visualization

techniques – Predictive analytics

• Accelerators Exploration – Apache Whirr - Cloud-neutral way to run services – Apache Mahout - Scalable machine learning library– Cascading - Distributed computing framework– HAMA - define and execute fault tolerant data processing

workflows• Exploration of LAMP-like stack for Big Data aggregation, processing and analytics

Page 11: Big Data with Hadoop and Cloud Computing

Thank You