cognitive blueprints with spectrum conductor...at quarter-end 80 hours 112.8 hours 125.6 hours...

Cognitive Blueprints with Spectrum Conductor Daniel Reiberg | reiberg@de.ibm.com

Agenda

o Cognitive cross industries

o Cognitive definition

o Challenges

o IBM Systems Blueprint

o DeepLearning

o Workload Management

o Native Service Management

o Resource Management & Orchestration

o Summary

CognitiveCross industries

IBM Systems

Deep Learning, Cognitive and AI Overview:

Automotive and Transportation

Financial Services Broadcast, Mediaand Entertainment

Consumer Web, Mobile, Retail

Security and PublicSafety

Medicine and Biology

• Autonomous driving:• Pedestrian detection• Accident avoidance

Automotive, trucking, heavy equipment, Tier 1 suppliers

• Sentiment Analysis• Market prediction• Fraud/Risk

Retail and Investment banks, capital markets firms

• Captioning• Search• Recommendations• Real time

translation

Consumer facing companies with media streaming, or real time content

• Image tagging• Speech recognition• Natural language • Sentiment analysis

Hyperscale web companies, large retailers

• Video Surveillance• Image analysis• Facial recognition

and detection

Local and national police, public and private safety/ security

• Drug discovery• Diagnostic assistance• Cancer cell detection

Pharmaceutical,Medical equipment, Diagnostic labs

CognitiveDefinition

IBM Systems

AI & Conginitve Definitions

o Artificial Intelligence (AI)

• Intelligence exhibited by machines or software

o Cognitive Computing

• Computing systems exhibiting acts of perception, memory, judgment or reasoning

o Machine Learning (ML)

• Type of AI that enables computers to learn without being explicitly programmed

o Deep Learning (DL)

• Type of ML, based on neural networks loosely modeled after the brain

• Learns features and representations of data

o Training

• Neural “inspired”, fed by millions of data points

• Repetition drives weighting and connections

o Inference• Classifying a new data point by inferring its similarity to the trained model

CognitiveChallenges

IBM Systems

1. Challenge: Build the environmentCognitive environments consist of many (open source) components

IBM Systems

Time to value matters

Just because you can

build a congitive environment from scratch

doesn’t mean you should!

IBM Systems

Many changes / new Versions

Commits

within projects

IBM Systems

2. Challenges: Managing cognitive applications Creating infrastructure silos to accommodate applications is inefficient

Many new solution workloads

in addition to existing apps

Leads to costly, complex, siloed, under-utilized infrastructure and

replicated data

Compliance

Trade Surveillance

Counterparty

Credit Risk

Modeling

Distributed ETL, Sentiment

Analysis

Low Utilization

= Higher cost

• Different LOBs

• Multiple Spark versions

• Different notebooks and versions

• Security, governance

• Application SLA

• DEV, UAT, PROD

• Different data sources, e.g.,

HDFS, Cassandra

• Existing applications

• Siloed organization

• Limited by technology

• It is new!

• …

CognitiveIBM Blueprint

IBM Systems

AI Frameworks

Notebook

Scheduler

Resource Management

Distributed File System

Client Options for Spark Adoption in the Enterprise

Ambari

Impala

Slider

Navigator

Zookeeper

Falcon

Accumulo

Phoenix

Ranger

Hadoop

“Data Lake First” AnalyticsSpectrum Conductor with Spark

“Analytics First”

• Install base Hadoop vendors promoting this approach

• Management complexities and storage rich server costs

are driving customers to look at alternative approaches

• Spark life cycle management is very difficult

• Provide alternative approach from IBM, focused on a

Spark centric future

• Leverage existing HDFS data as a data store, but not

dependent on it

• Easily manage Spark life cycle

• Efficiently using / training AI models

IBM Systems

Why Spark?

1. Unified Analytics Platform• Widely understood programming logic

2. Performance

(faster than Hadoop MR)• 100x faster in memory, 10x on disk

3. Data Agnostic• can access diverse data sources

4. Rich Set of Certifications &

Expanding EcoSystem

5. Fast evolution• Most Active Apache

Open Source Project

Spark is a fast and general engine for large-scale data processing

IBM Systems

Common Use Cases for Spark

Spark is implemented inside many types of products, across a multitude of industries and organizations

Source: 2015 Databricks Spark Survey

IBM Systems

IBM Turn key solution for cognitive

Business Problem

• Organizations looking to introduce Deep Learning applications leveraging their existing data to enhance their business and applications.

• Data Scientists moving from initial individual experiments to need for enterprise Deep Learning platform as production use cases are deployed.

Solution

• Integrated solution from open source frameworks, Deep Learning tools, and foundational big data platform on optimized Cognitive systems for DL workloads

• Combination of Conductor with Spark , Deep Learning Module, Power AI and Power Minsky

Our Value

• High performance Spark centric analytic environment with dis-aggregated storage and compute architecture.

• Marriage of the best of Spark and HPC to achieve accuracy for ML/DL applications with highest performance and at lowest cost

• Enterprise-class multi-tenant platform that provides ease of use to data scientists and efficiency to admins.

• Extensible to new DL frameworks and preferred Data Science user tools.

x86Power (optimized) +

MLLib Graphx

Conductor with Spark DL Module

User Application and Data Science Tools

Spectrum Scale

Spectrum Conductor with Spark

Tensorflow Caffe Theano Torch7

Make scalable and efficient Big Data DL

platform accessible to enterprises

IBM Systems

Enterprise Class Spark Solution

Red Hat Linux

Spark Workload Management

Resource Management & Orchestration

…x86

Native Services Management

IBM Spectrum Conductor

with Spark

Spark SQLSpark

StreamingGraphXMlib / DL

DL Module

Tensorflow Caffe Theano Torch7

Deep LearningModel & Data Management

IBM Systems

Data Preparation, Model Management & Training

Import data from different formats

Transform, split and shuffle data

Training / Hyper-parameter Search & Tuning

IBM Systems

Data Preparation for Deep Learning

Tumor Proliferation Assessment – mitosis detection

Images from electron-microscope

Size of image - 70K * 60K

Framework Format Input Size (Faster R-CNN)

Caffe LMDB 1K*1K

TensorFlow TensorRecord 1K*1K

Data Transformation

Data Distribution among

training, validation and testing

Data Shuffle

WorkloadManagement & Monitoring

IBM Systems

LinuxLinux

Spark Shared Services Model – On-Premise Spark Cloud

Physical view: Spectrum Conductor with Spark installed on each Linux Server

Logical view: Users (groups) have their own Spark cluster and they are isolated, protected, secured by Spark Instance Groups – Managed by SLA

Data scientist

Researcher

Virtual Spark cluster

(PaaS)

LinuxCustomer behavior...

Trend analysis...

HPC...

Marketing...

Fraud detection...

instance

group #1

instance

group #2

instance

group #3

Management

Nodes Pool

Compute Nodes Pool

Spectrum Scale

Administrator

Web consoleCreate Spark

instance group

LinuxLinux

IT or Data

Warehouse

ETL / Batch

instance

group #4

instance

group #5

IBM Systems

Workload Management Features Summary

• Lifecycle management for Spark, AI frameworks and Data Science Tools

(Downloads for different versions from FixCentral and DevOps pages)

• Fast scheduling with Spark Session Scheduler (see benchmark)

• GPU aware (utilization of NVLink on Power)

• Shared RDD

• Multi tennancy

• Fine grained control of life cycle of spark binary, notebook update, deployment, resource plan,

reporting, monitoring, log retrieval and execution of either notebook or batch submission

• Runtime isolation with Spark instance groups

(Driver/executor process are owned by submitter)

• Data at Rest isolation

(Data/log can only be accessed by owner)

• Encryption

• SSL with daemon authentication for all communications within IBM Spectrum Conductor

• Storage encryption when combined with IBM Spectrum Scale

IBM SystemsIBM Systems

Why Conductor : Performance – most recent STAC Benchmark

Key Points• Conductor continues to

demonstrate performance and throughput leadership driving ROI for client

UC1 Throughput• 56% higher than YARN

• 57% higher than Mesos

UC4 –Throughput• 224% higher than YARN

IBM Systems

Competitive advantage through faster analytics

41% greater throughput than Spark with YARN

57% greater than Spark with Mesos

Platform Conductor for Spark

Spark / YARN Spark / Mesos

When minutes count 10 minutes 14.1 minutes 15.7 minutes

At quarter-end 80 hours 112.8 hours 125.6 hours

Product development 26 weeks 36.7 weeks 40.8 weeks

Source: STAC Report: Spark Resource Managers, Phase 1 (March 28, 2016)

Note: IBM is an active contributor in the Mesos community, helping to advance its capabilities and integration with IBM solutions

Monitoring and Reporting

• Integrated Elastic Search, Logstash, & Rave for customizable monitoring• Built-in monitoring Metrics

• Cross Spark Instance Groups

• Cross Spark Applications within Spark Instance Group

• Within Spark Application

• Built-in monitoring inside Zeppelin Notebook• Basis for charge back models

ServiceManagement

IBM Systems

Application / Service composition

✓ Service and application definition

✓ Service life cycle management

✓ Integration with OS containers

(cgroups, Docker)

✓ Complex service dependency

✓ HA, Persistency, virtual IP mgmt

✓ Elastic service pool

✓ Stateful vs. stateless services

✓ API & scriptable interface

IBM Systems

Application management & monitoring accross the whole life cycle

✓ Easily monitor and multiple application

instances from a single interface

✓ Elastic service pools

✓ Multiple triggers for “grow/shrink”

✓ Dynamic services deployment

✓ Resource Sharing among

“long running” services and short “tasks/jobs”

✓ Applications organized hierarchically, mapping

to the organization hierarchy

✓ Complex services can be managed by

personnel not familiar with details of each

service

✓ System services as well as complex

application frameworks

ResourceManagement & Orchestration

IBM Systems

Resource ManagementManage, Monitor and report on infrastructure usage through a single interface

✓ Monitor how assets are being used

across on or more facilities

✓ Partition resources to support multiple

application instances or lines of business

✓ Enforce security policies enabling secure

multitenant deployments

✓ Track resource usage over time for

show-back or chargeback accounting

purposes

✓ Integration in enterprise monitoring via

SNMP, SOAP or REST API

Summary

Software Defined Cognitive Infrastructure Benefits

Eliminate Silos

Simplified Administration

Single-pane-of-glass management and monitoring of shared services

Global shared access to ensure data is available right when it’s needed

Minimize Deployment time

Run application workload in the most flexible & efficient ways possible Intelligent Resourcing

Multi-tenant integrated application and data fabric

Improve data availability

Automated deployment of physical and virtualized resources

IBM Software Defined Infrastructure

Heterogeneous Infrastructure Support

Workload AwareScheduling

SharedResourceManagement

High Performance Analytics Risk Analytics

High Performance ComputingDesign/Simulation/Modeling

On-premises, On-cloud, Hybrid Infrastructure

‘New-gen Workloads’Hadoop, Spark, Containers

Global Data Management

DiskFlash Tape Power x86 Linux on z docker VMARMSparcGPU

Thank you.

IBM Systems

ibm.com/systems

cognitive blueprints with spectrum conductor...at quarter-end 80 hours 112.8 hours 125.6 hours...

Documents

spanish conversation 2018 class: 3 hours/6 weeks/ 1 … ·...

€¦ · · 2014-10-172.5 lecture hours/weeks 2 lab hours...

ingersoll rand turns rich survey data into actionable...

c==j - uaf.edu · 9. contact hours per week: ill lecture...

inf1204-fundamentals of programming ii -15 credits -15 weeks...

act500: managerial accounting credit hours … managerial...

version 1.0 steering the course for 9-12 grades strategies...

comparative literature (com...

shropshire public health operating procedure...1.5 hours...

fall 2012 (20133) field experiences - st. cloud state...

8509 36 weeks / 140 hours

4baydeltalive.com/assets/437f4a0cc9f443ff13b34280d07619bd/... ·...

provider name · web view(word, excel, powerpoint) 1 $105...

philadelphia university...prerequisite: calculus ii...

ordinary hours, penalties & overtime - industrial … ·...

software project planning “weeks of programming can save...

spanish conversation 2018 class: 3 hours/6 weeks/ 1 day...

practising certificate experience - acca global · a...

virginia’s licensed clinical psychologist workforce:...

capacities from 345 to 2180 ampere-hours · the true long...