strata + hadoop world 2012: taming the elephant - learn how monsanto manages their hadoop clusters...

33
Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing Erich Hochmuth Mark Seidenstricker Bala Venkatrao Aparna Ramani Hadoop World 2012, New York, October 25 th , 2012

Upload: cloudera-inc

Post on 06-May-2015

1.219 views

Category:

Documents


0 download

DESCRIPTION

Managing Hadoop clusters to meet business needs can be challenging. Learn how Monsanto has effectively tamed the elephant using Cloudera Manager.

TRANSCRIPT

Page 1: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Erich HochmuthMark Seidenstricker

Bala VenkatraoAparna Ramani

• Hadoop World 2012, New York, October 25th, 2012

Page 2: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Agenda

• Introductions• Monsanto Hadoop Use Case

• Operational Challenges• How Monsanto leverages Cloudera Manager & Product Demo• Key benefits of using Cloudera Manager

• Cloudera Manager• Overview• Key Features• Roadmap

• Q&A2

Page 3: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Introductions

•Monsanto• Erich Hochmuth – R&D IT Data & Analytics Lead• Mark Seidenstricker – Infrastructure R&D Architect

• Cloudera• Bala Venkartrao – Director, Products• Aparna Ramani – Director, Engineering

3

Page 4: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Monsanto Serves Farmers Around the WorldWorking With Growers Large and Small, Row Crops and Vegetables

4

Page 5: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Monsanto’s Approach to Driving YieldA System of Agriculture Working Together to Boost Productivity

The science of improving plants by inserting genes into their DNA

BIOTECHNOLOGYBREEDING AGRONOMICS

The art and science of combining genetic material to produce a new seed

The farm management practices involved in growing plants

5

Page 6: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Increasing Yield through Big DataAt the Cornerstone of Yield Increases is Information & Analytics

• PBs of NGS data• 10’s TBs of genomic data• TBs of yield data• Billions of genotyping dps

VolumeVariety Velocity

• Raw Sequence data• Unstructured sensor data• Poly-structured genomic data• Spatial data

• 10’s millions yield dps/day• 100’s million genotyping dps/day• TBs of NGS data/week

Increased Yield

6

Page 7: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

What are the Challenges of managing a Hadoop Cluster?

Software Provisioning & Configuration Management• Automated & simplified installation/patch management • Streamlined cluster configuration

Enterprise –ready Tools• Enterprise grade monitoring & management capabilities• Integration with existing enterprise IT stack

Reporting & Monitoring• Proactive monitoring & alerting• Capacity planning

Support• Midwest Location• Lack of Hadoop expertise

7

Page 8: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

With Cloudera Manager, you get…Intuitive Management Console

• Mission control style dashboard for entire cluster • Centralized management of entire Hadoop ecosystem• Treat the cluster as an appliance• Configuration change audit & validation

Integration with Enterprise IT Management Tools• Connect to Corporate LDAP• Cloudera Manager API integrates with existing BMC platform

Comprehensive Monitoring & Alerting• Proactive service level alerts• Summarized cluster level graphs & charts• Real-time series charts (MapReduce & HBase)

Historical Cluster Metrics/Reports• Capacity planning - Disk usage/ Slot Capacity

8

What are the Solutions?

Page 9: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Lowers the barrier for Hadoop administration• Do not need to rely on experts solely

• Reduces the number of administrators needed

Provides a “one-stop” holistic view• Easy to understand how the overall cluster is performing

Includes pre-tuned configuration with best practices• Get straight to solving the business problem

Integrates with Cloudera support• Leverage the real experts…not just for bugs

What are the Benefits of Cloudera Manager?

9

Page 10: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Cloudera Enterprise – The Platform for Big Data

10

Page 11: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Why You Need Cloudera Manager?

Hadoop is more than a dozen services running across many machines• Hundreds of hardware components• Thousands of settings• Limitless permutations

Complexity

Hadoop is a system, not just a collection of parts• Everything is interrelated• Raw data about individual pieces is not enough• Must extract what’s important

Context

Managing Hadoop with multiple tools & manual process takes longer• Complicated, error-prone workflows• Longer issue resolution• Lack of consistent & repeatable processes

Efficiency

11

Page 12: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Cloudera ManagerEnd-to-End Administration for CDH

DeployInstall, configure & start your cluster in 3 simple steps

1Configure & OptimizeEnsure optimal settings for all hosts & services2Monitor, Diagnose & ReportFind & fix problems quickly, view current & historical activity & resource usage

312

Page 13: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Managing Complexity

One Tool For EverythingDEPLOYMENT & CONFIGURATION MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING ACTIVITY

MONITORING

CLOUDERA ENTERPRISE

+

DO-IT-YOURSELF

“In a recent Cloudera survey, >95% of respondents emphasized the importance of having a single end-to-end tool to manage their Hadoop Operations”

13

Page 14: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Raw Data vs. Hadoop IntelligenceProviding Context

? VS.

Smart ConfigurationAuto-sets configurations & guards against user error1WorkflowsEnsures that multi-step tasks are accomplished completely & in the correct sequence

2DependenciesAware of how a particular action affects the rest of the cluster & manages the impact

3Events & AlertsMakes you aware of what’s important at a Hadoop system level4HistoryCompares current & past activities for context5

14

Page 15: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Cloudera Manager Key FeaturesAutomated Deployment Installs the complete Hadoop stack in minutes via a wizard-based interface

Centralized Management Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface

Multi-Cluster Management Allows you to manage multiple clusters from a single instance of Cloudera Manager

LDAP Authentication Integrate Cloudera Manager with Active Directory

Global Time Control Establishes the time context globally for almost all views

Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis

Service & Configuration Management

Set server roles, configure services and manage security across the cluster

Gracefully start, stop and restart of services as needed

Role-Based Administration Supports Administrator and Read-Only users

Audit Trails Maintains a complete record of configuration changes with the ability to roll back to previous states

Proactive Health Checks Monitors dozens of service performance metrics and alerts you when you approach critical thresholds

15

Page 16: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Cloudera Manager Key Features (Contd..)Intelligent Log Management Gather, view and search Hadoop logs collected from across the cluster

Scans Hadoop logs for irregularities and warns you before they impact the cluster

Event Management Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching

Alerting Generates email alerts when certain events occur

Activity Monitoring Consolidates all cluster activity into a single, real-time view

Host Level Monitoring View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles

Heatmaps Visualize health status and metrics across the cluster to quickly identify problem nodes and take action

Operational Reports Visualize current and historical disk usage by user, group and directoryTrack MapReduce activity on the cluster by job or user

Support Integration Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution

Comprehensive API Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools

16

Page 17: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Cloudera Manager Roadmap• Cloudera Manager 4.1 – Released 10/24

• Platform Support for CDH4.1• Cloudera Impala management & monitoring • New monitoring – Zookeeper, Flume NG• Maintenance Mode• Host Decommissioning• Several Usability Enhancements

• Cloudera Manager 4.5 – Early 2013• Rolling Upgrades/ Restarts• Enhanced Monitoring, Cluster Heatmaps etc.• Role Groups Configuration• Cloud Support• Others – SNMP support, Error handling, ISV integration etc.

17

Page 18: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Why Cloudera Manager?

End-to-End Hadoop administration in a single toolSimple

Manages Hadoop at a system level – Cloudera’s experience realized in softwareIntelligent

Simplifies complex workflows & makes administrators more productiveEfficient

The only enterprise-grade Hadoop management application availableBest-in-Class

18

Page 19: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Next Steps

• Try out FREE edition of Cloudera Manager• Download from:

http://www.cloudera.com/products-services/tools/• Support available via [email protected]

• For Cloudera Enterprise subscriptions, please contact: [email protected]

19

Page 20: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Q&A

20

Page 21: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing
Page 22: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

22

Cloudera Manager

Key Features

Page 23: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

23

1 2 3Find Nodes Install Components Assign Roles

Enter the names of the hosts which will be included in the Hadoop cluster. Click Continue.

Cloudera Manager automatically installs the CDH components on the hosts you specified.

Verify the roles of the nodes within your cluster. Make changes as necessary.

Install A Cluster In 3 Simple StepsCloudera Manager Key Features

Page 24: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

View Service Health & PerformanceCloudera Manager Key Features

24

Page 25: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Get Host-Level SnapshotsCloudera Manager Key Features

25

Page 26: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Monitor & Diagnose Cluster WorkloadsCloudera Manager Key Features

26

Page 27: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Gather, View & Search Hadoop LogsCloudera Manager Key Features

27

Page 28: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Track Events From Across The ClusterCloudera Manager Key Features

28

Page 29: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Report On System Performance & UsageCloudera Manager Key Features

29

Page 30: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Visualize Health Status With HeatmapsCloudera Manager Key Features

30

Page 31: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Manage Multiple CDH ClustersCloudera Manager Key Features

31

Page 32: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Easily Configure High AvailabilityCloudera Manager Key Features

32

Page 33: Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Set The Time Context GloballyCloudera Manager Key Features

33