computerworld big data forum 2015

29
© 2013 IBM Corporation © 2015 IBM Corporation 1 ComputerWorldHK2015 Systems of Insights - making big data analytics more consumable Steven Sit Director of Products Open Source Based Analytic Systems

Upload: steven-sit

Post on 14-Jan-2017

416 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation1

ComputerWorldHK2015Systems of Insights - making big data analytics more consumable

Steven SitDirector of Products

Open Source Based Analytic Systems

Page 2: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation2 22

Extend & Integrate

Big Dataand

Analytics

Actionable Insights @Point of Impact

Operational Systems

• Smarter Infrastructure • Security Intelligence • Enterprise Applications

Systems of Engagement

• Mobile Commerce

• Call Center • Social Business

Leaders are Leveraging Big Data to Deliver Actionable Insights at the Point of Impact

Page 3: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation3

Information Ingestion

and Operational Information

Landing andArchive Zone

Real-timeIn-memory

Zone

Enterprise Warehouse & Mart Zone

Governance, Security and Business Continuity

Analytic Appliances

Big Data Platform CapabilitiesAll Data Sources Advanced Analytics Applications

Streaming Data

Text Data

Applications Data

Time Series

Geo Spatial

Relational

Social Network

Video & Image

Automated Process

Case Management

Analytic Applications

CognitiveLearn

Dynamically?

PrescriptiveBest Outcomes?

PredictiveWhat Could

Happen?

DescriptiveWhat Has

Happened?

Exploration & DiscoveryWhat Do You

Have?

Watson

Cloud Services

ISV Solutions

Alerts

Page 4: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation4

Common Big Data Use Cases

Big Data ExplorationFind, visualize, understand all big data to improve decision making

Enhanced 360o Viewof the CustomerExtend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources

Operations AnalysisAnalyze a variety of machinedata for improved business results

Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency

Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time

Page 5: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation5

Big Data to Improve Health Care for Millions of Patients

Independence Blue Cross is leveraging Big Data to improve the lifestyle for those who are ill• Maps of complex referral network to

identify cost-efficient providers

• Identify patients who are at the highest risk of being re-hospitalized within a short period of time

• Enhance the efficacies of managing chronic disease, such as early detection of diabetes

• Analyze clinical data and scanned images to identify hip implant patients who has high-risk exposure to complications such as metallosis, infection and dislocation

Page 6: Computerworld Big Data Forum 2015

© 2015 IBM Corporation6

Page 7: Computerworld Big Data Forum 2015

© 2015 IBM Corporation77

Page 8: Computerworld Big Data Forum 2015

© 2015 IBM Corporation8

Page 9: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation9

Rain detected *vibrate windshield*

Rubbing sound * publish event*

Part failure prediction

Page 10: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation10

75%Drivers reduced fuel consumption with

better driving behavior

• Improve safety & lower insurance premiums

• Preventive maintenance to improve car reliability

• Provide services such as accident and weather alerts

Page 11: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation11

2.7Mleads for upsell based on customer

profiles

• Better accuracy in targeted marketing activities based on individual interests, intentions and goals

• Protect individual privacy in profiles

Page 12: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation12 IBM Research

Counterparty Relationships from Regulatory Filings

Annual Report Loan Agreement

Proxy Statement Insider Transaction

Counterparty Relationships

Loan Exposure

Company

Person

Extract Integrate

Over 2500 financial companiesOver 33000 key officials in financial companiesSample set of 1 Million documents

2005 2014 Filingtimeline

SEC/FDIC Filings of Financial Companies

(Forms 10-K,8-k, 10-Q, DEF 14A, 3/4/5, 13F, SC 13D SC 13 G

FDIC Call Reports)

• Regulatory• Systemic

Risk • Investment

Decisions

Page 13: Computerworld Big Data Forum 2015
Page 14: Computerworld Big Data Forum 2015

64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Mr?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 1284664.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 452364.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 629164.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 735264.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 525364.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem HTTP/1.1" 200 1138264.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924

Page 15: Computerworld Big Data Forum 2015

SELECT FNAME, LNAME, (SELECT NAME

FROM DEPARTMENT AS DEP

WHERE DEP.DEPT_ID = EMP.DEPT_ID) FROM EMPLOYEE AS EMP

64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Mr?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 1284664.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 452364.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 629164.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 735264.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 525364.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem HTTP/1.1" 200 1138264.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924

Page 16: Computerworld Big Data Forum 2015

X2

Y5

Y6

… …

X1X2X3

Xn

… Y1

Y2

Y3

Yn

SELECT FNAME, LNAME, (SELECT NAME

FROM DEPARTMENT AS DEP

WHERE DEP.DEPT_ID = EMP.DEPT_ID) FROM EMPLOYEE AS EMP

64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Mr?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 1284664.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 452364.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 629164.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 735264.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 525364.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem HTTP/1.1" 200 1138264.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924

Page 17: Computerworld Big Data Forum 2015

X2

Y5

Y6

… …

X1X2X3

Xn

… Y1

Y2

Y3

Yn

SELECT FNAME, LNAME, (SELECT NAME

FROM DEPARTMENT AS DEP

WHERE DEP.DEPT_ID = EMP.DEPT_ID) FROM EMPLOYEE AS EMP

64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Mr?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 1284664.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 452364.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 629164.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 735264.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 525364.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem HTTP/1.1" 200 1138264.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924

Page 18: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation1818

The current ecosystem is challenged and slowed by fragmented and duplicated efforts.

The ODP Core will take the guesswork out of the process and accelerate many use cases by running on a common platform.

Freeing up enterprises and ecosystem vendors to focus on building business driven applications.

Page 19: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation19

Open Data Platform – Stakeholders Across the Hadoop Spectrum

Representation across the Hadoop ecosystem…

• Hadoop distribution vendors• Software application providers• Systems integrators and

consultants• Hardware vendors

… who all believe in the need for a community-based effort to standardize Hadoop, which will lead to improved adoption

Page 20: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation20

The Open Data Platform Initiative Will Benefit Customers

ODP consortium’s goal: anyone using a Hadoop distribution based on the ODP Core

will be able develop Hadoop products or apps with assurances of seamless

deployment and compatibility.

Apache Hadoop Open Source Ecosystem

HBase

Spark

Flume

Hive Pig

Sqoop

HCatalog

Solr/Lucene

HDFS

YARN

MapReduce

Ambari

Initial ODP Scope Zookeeper Oozie Knox Slider

Open Platform

4.0 with Apache

Hadoop

Pivotal HD 3.0

HDP 2.2

Page 21: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation21

21

RTextSQLSheetsMatch

Page 22: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation22 Built on the IBM Open Platform with Apache Hadoop

POSIX Distributed Filesystem

Multi-workload, Multi-tenant scheduling

IBM BigInsights Enterprise Management

IBM BigInsights Analyst

Big SQL

BigSheets

IBM BigInsights Data Scientist

Big SQL

BigSheets

Big R + ML

Text Analytics

Page 23: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation23

In-MemoryPerformance

Ease of Development

Easier APIs:Python, Scala,

Java

Resilient Distributed Datasets

Unify processing

Batch InteractiveIterative Algorithms

Micro-batch

PolyglotWorkloads

ReliabilityResiliencySecurity

Multiple data sources and applications

Multiple users

UnlimitedScale

EnterprisePlatform

Wide Range of Applications

FilesDatabases

Semi-structured

Pretty intense Java programming required and knowledge of parallelism

Few abstractions available and ones that do exists perform poorly

No in-memory framework, when tasks complete data sets no longer in memory

Each map tasks is a disk write and new maps are a disk read in a workflow

Suitable for batch – what it was built for - but use cases are changing

CHALLENGES

Page 24: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation24

Apache Spark is a fast, general purpose, easy-to-use cluster computing system for large-scale data processing- Fast

• Leverages aggressively cached in-memory distributed computing and JVM threads

• Faster than MapReduce- Generality

• Covers a wide range of workloads• Provides SQL, streaming and complex

analytics- Ease of use (for programmers)

• Spark is written in Scala, an object oriented, functional programming language

• Scala, Python and Java APIs• Scala and Python interactive shells• Runs on Hadoop, Mesos, standalone or cloud

Logistic regression in Hadoop and Spark

from http://spark.apache.org

Page 25: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation25

Spark Resilient Distributed Datasets

Slave node 1

c3 d2

a2 b1

partition3

partition1

partition2

Slave node 2

c2 d1

a1 b2

partition1

partition3

Slave node 3

c1 d2

a3 b3

partition2

partition2

partition1

RDD1

RDD2

RDD3

Spark RDDIn-memory distribution

HDFSOn-disk

distribution

Page 26: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation26

IBM Software Defined Infrastructure

ExampleApplications

High Performance

Analytics (Low Latency

Parallel)

Homegrown

Hadoop / Big Data

Application Frameworks(Long Running

Services)

High Performance Computing(Batch, Serial, MPI, Workflow)

Homegrown

IBM Platform Computing

WorkloadEngines

ResourceManagement

Scheduling & Acceleration With Infrastructure Sharing

MapReduce(Symphony)Symphony

Application Service

ControllerLSF

Page 27: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation27

Create Apps Quickly with Prebuilt Services

Security Services

Web and application

services

CloudIntegration

Services

Mobile Services

Database Services

Big Data Services

Internet of Things

Services

Watson Services

DevOps Services

A full range of capabilities to suit any great idea

Choice: runtimes, services, and tooling up to you

Industry Leading IBMCapabilities

– Services leveraging the depthof IBM software

– Full range of capabilities

Completeness– Open source platform and services– Third party to enable key use

cases

Analytic Services:– dashDB, DataWorks, EHaaS,– Watson, Cloudant, DBaaS– Spark as a Service (SaaS)– +++

Page 28: Computerworld Big Data Forum 2015

© 2013 IBM Corporation© 2015 IBM Corporation28

IBM Announces First 20 Industry Analytics Solutions

Behavior-based Customer InsightFor Banking

Multi-Channel Fraud AnalyticsFor Banking

Behavior-based Client InsightFor Wealth Management

Trade Compliance AnalyticsFor Financial Markets

Regulatory Compliance & ControlFor Financial Markets

AML Monitoring & AnalyticsFor Financial Markets

Behavior-based Customer InsightFor Insurance

Producer Lifecycle & Credential ManagementFor Insurance

Property & Casualty Claims FraudFor Insurance

Lift AnalyticsFor Retail

Social MerchandisingFor Retail

Behavior-based Customer InsightFor Telecommunications

Asset Analytics for Transmission & DistributionFor Energy & Utilities

Asset Analytics for Rotational EquipmentFor Oil & Gas

Asset Analytics for Robotics EquipmentFor Automotive

Threat Intelligence AnalysisFor National Security & Defense

COPLINK on CloudFor Law Enforcement

Behavior-based Audience InsightFor Media & Entertainment

Social MerchandisingFor Consumer Products

Customer Experience AnalyticsFor Telecommunications

Page 29: Computerworld Big Data Forum 2015