strata + hadoop world 2012: taming the elephant - learn how monsanto manages their hadoop clusters...

Post on 06-May-2015

1.219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Managing Hadoop clusters to meet business needs can be challenging. Learn how Monsanto has effectively tamed the elephant using Cloudera Manager.

TRANSCRIPT

Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

Erich HochmuthMark Seidenstricker

Bala VenkatraoAparna Ramani

• Hadoop World 2012, New York, October 25th, 2012

Agenda

• Introductions• Monsanto Hadoop Use Case

• Operational Challenges• How Monsanto leverages Cloudera Manager & Product Demo• Key benefits of using Cloudera Manager

• Cloudera Manager• Overview• Key Features• Roadmap

• Q&A2

Introductions

•Monsanto• Erich Hochmuth – R&D IT Data & Analytics Lead• Mark Seidenstricker – Infrastructure R&D Architect

• Cloudera• Bala Venkartrao – Director, Products• Aparna Ramani – Director, Engineering

3

Monsanto Serves Farmers Around the WorldWorking With Growers Large and Small, Row Crops and Vegetables

4

Monsanto’s Approach to Driving YieldA System of Agriculture Working Together to Boost Productivity

The science of improving plants by inserting genes into their DNA

BIOTECHNOLOGYBREEDING AGRONOMICS

The art and science of combining genetic material to produce a new seed

The farm management practices involved in growing plants

5

Increasing Yield through Big DataAt the Cornerstone of Yield Increases is Information & Analytics

• PBs of NGS data• 10’s TBs of genomic data• TBs of yield data• Billions of genotyping dps

VolumeVariety Velocity

• Raw Sequence data• Unstructured sensor data• Poly-structured genomic data• Spatial data

• 10’s millions yield dps/day• 100’s million genotyping dps/day• TBs of NGS data/week

Increased Yield

6

What are the Challenges of managing a Hadoop Cluster?

Software Provisioning & Configuration Management• Automated & simplified installation/patch management • Streamlined cluster configuration

Enterprise –ready Tools• Enterprise grade monitoring & management capabilities• Integration with existing enterprise IT stack

Reporting & Monitoring• Proactive monitoring & alerting• Capacity planning

Support• Midwest Location• Lack of Hadoop expertise

7

With Cloudera Manager, you get…Intuitive Management Console

• Mission control style dashboard for entire cluster • Centralized management of entire Hadoop ecosystem• Treat the cluster as an appliance• Configuration change audit & validation

Integration with Enterprise IT Management Tools• Connect to Corporate LDAP• Cloudera Manager API integrates with existing BMC platform

Comprehensive Monitoring & Alerting• Proactive service level alerts• Summarized cluster level graphs & charts• Real-time series charts (MapReduce & HBase)

Historical Cluster Metrics/Reports• Capacity planning - Disk usage/ Slot Capacity

8

What are the Solutions?

Lowers the barrier for Hadoop administration• Do not need to rely on experts solely

• Reduces the number of administrators needed

Provides a “one-stop” holistic view• Easy to understand how the overall cluster is performing

Includes pre-tuned configuration with best practices• Get straight to solving the business problem

Integrates with Cloudera support• Leverage the real experts…not just for bugs

What are the Benefits of Cloudera Manager?

9

Cloudera Enterprise – The Platform for Big Data

10

Why You Need Cloudera Manager?

Hadoop is more than a dozen services running across many machines• Hundreds of hardware components• Thousands of settings• Limitless permutations

Complexity

Hadoop is a system, not just a collection of parts• Everything is interrelated• Raw data about individual pieces is not enough• Must extract what’s important

Context

Managing Hadoop with multiple tools & manual process takes longer• Complicated, error-prone workflows• Longer issue resolution• Lack of consistent & repeatable processes

Efficiency

11

Cloudera ManagerEnd-to-End Administration for CDH

DeployInstall, configure & start your cluster in 3 simple steps

1Configure & OptimizeEnsure optimal settings for all hosts & services2Monitor, Diagnose & ReportFind & fix problems quickly, view current & historical activity & resource usage

312

Managing Complexity

One Tool For EverythingDEPLOYMENT & CONFIGURATION MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING ACTIVITY

MONITORING

CLOUDERA ENTERPRISE

+

DO-IT-YOURSELF

“In a recent Cloudera survey, >95% of respondents emphasized the importance of having a single end-to-end tool to manage their Hadoop Operations”

13

Raw Data vs. Hadoop IntelligenceProviding Context

? VS.

Smart ConfigurationAuto-sets configurations & guards against user error1WorkflowsEnsures that multi-step tasks are accomplished completely & in the correct sequence

2DependenciesAware of how a particular action affects the rest of the cluster & manages the impact

3Events & AlertsMakes you aware of what’s important at a Hadoop system level4HistoryCompares current & past activities for context5

14

Cloudera Manager Key FeaturesAutomated Deployment Installs the complete Hadoop stack in minutes via a wizard-based interface

Centralized Management Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface

Multi-Cluster Management Allows you to manage multiple clusters from a single instance of Cloudera Manager

LDAP Authentication Integrate Cloudera Manager with Active Directory

Global Time Control Establishes the time context globally for almost all views

Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis

Service & Configuration Management

Set server roles, configure services and manage security across the cluster

Gracefully start, stop and restart of services as needed

Role-Based Administration Supports Administrator and Read-Only users

Audit Trails Maintains a complete record of configuration changes with the ability to roll back to previous states

Proactive Health Checks Monitors dozens of service performance metrics and alerts you when you approach critical thresholds

15

Cloudera Manager Key Features (Contd..)Intelligent Log Management Gather, view and search Hadoop logs collected from across the cluster

Scans Hadoop logs for irregularities and warns you before they impact the cluster

Event Management Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching

Alerting Generates email alerts when certain events occur

Activity Monitoring Consolidates all cluster activity into a single, real-time view

Host Level Monitoring View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles

Heatmaps Visualize health status and metrics across the cluster to quickly identify problem nodes and take action

Operational Reports Visualize current and historical disk usage by user, group and directoryTrack MapReduce activity on the cluster by job or user

Support Integration Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution

Comprehensive API Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools

16

Cloudera Manager Roadmap• Cloudera Manager 4.1 – Released 10/24

• Platform Support for CDH4.1• Cloudera Impala management & monitoring • New monitoring – Zookeeper, Flume NG• Maintenance Mode• Host Decommissioning• Several Usability Enhancements

• Cloudera Manager 4.5 – Early 2013• Rolling Upgrades/ Restarts• Enhanced Monitoring, Cluster Heatmaps etc.• Role Groups Configuration• Cloud Support• Others – SNMP support, Error handling, ISV integration etc.

17

Why Cloudera Manager?

End-to-End Hadoop administration in a single toolSimple

Manages Hadoop at a system level – Cloudera’s experience realized in softwareIntelligent

Simplifies complex workflows & makes administrators more productiveEfficient

The only enterprise-grade Hadoop management application availableBest-in-Class

18

Next Steps

• Try out FREE edition of Cloudera Manager• Download from:

http://www.cloudera.com/products-services/tools/• Support available via scm-users@cloudera.org

• For Cloudera Enterprise subscriptions, please contact: sales@cloudera.com

19

Q&A

20

22

Cloudera Manager

Key Features

23

1 2 3Find Nodes Install Components Assign Roles

Enter the names of the hosts which will be included in the Hadoop cluster. Click Continue.

Cloudera Manager automatically installs the CDH components on the hosts you specified.

Verify the roles of the nodes within your cluster. Make changes as necessary.

Install A Cluster In 3 Simple StepsCloudera Manager Key Features

View Service Health & PerformanceCloudera Manager Key Features

24

Get Host-Level SnapshotsCloudera Manager Key Features

25

Monitor & Diagnose Cluster WorkloadsCloudera Manager Key Features

26

Gather, View & Search Hadoop LogsCloudera Manager Key Features

27

Track Events From Across The ClusterCloudera Manager Key Features

28

Report On System Performance & UsageCloudera Manager Key Features

29

Visualize Health Status With HeatmapsCloudera Manager Key Features

30

Manage Multiple CDH ClustersCloudera Manager Key Features

31

Easily Configure High AvailabilityCloudera Manager Key Features

32

Set The Time Context GloballyCloudera Manager Key Features

33

top related