david chancellor- maddison - ibm · market trends •forrester estimates that 100% of all large...

David Chancellor-MaddisonEuropean Strategic Initiatives Leader for Systems

Thanks to : In Time

@[email protected]+447827084570

http://www.imdb.com/title/tt1637688/

mailto:[email protected]

Agenda

2

• Market Analysis

• Customer Examples: Sony, video, Thames Water

• Data Building Blocks

• Hadoop

• Spark

• Analytics

• IDEHS

• Q&A

Market trends

• Forrester estimates that 100% of all large enterprises will adopt Hadoop and related technologies such as

Spark for big data analytics within the next two years. (01/19/2016)

• Spark will reinvigorate Hadoop and, in 2016, nine out of every 10 projects on Hadoop will be Spark-related

projects. (IBM MDI, 02/08/2016)

• Global Hadoop market is expected to garner revenue of $84.6B by 2021. Europe is anticipated to witness the

fastest CAGR of 65.7% during the forecast period (2016 - 2021). (Allied Market Research, 01/2016)

IDC: 53.5% are actively using Spark for Hadoop data.

4

IDC# US41157616 (April, 2016) Source: IDC Hadoop and NoSQL Database Survey, N=219, March 2016

37.1

24.8

15.2

13.5

9.4We use MapReduce for all Hadoopprocessing (note: this includes Hive).

We have been using MapReduce, butare putting all new workloads onSpark.

We are migrating our MapReduceworkloads to Spark.

We only use Spark for Hadoop dataprocessing.

We use some other DBMS on top ofHadoop for most of our dataprocessing (e.g., Splice Machine)

Use DBMS

on Hadoop

53.5% using

Spark

37.1% Just use

Mapreduce

Sony Oceans of Data but no insight!

Insight is lost revenue if you cannot get it in time.

Re-run report due to data anomaly

Can Eli learn and then Help?(Vid)

Thames Water Customer dissatisfaction impacts revenue

https://www.youtube.com/watch?v=zVdVcZzhDF4

Data Building Blocks

7

Unstructured Data Structured Data

There are 2 kinds of Data: Give me some Examples

All Data Business Outcomes

Apache Hadoop• Hadoop design features

– Large single file system

– Optimized for Batch and streaming reads of large files

• What is Hadoop Good at?

– Data Consolidation,repository

– Cost Reduction

– 360 view of customer, data exploration

• What is Hadoop bad at?

– Audience please tell me

– FB

– How did they fix it?

For more information about Hadoop, go to: https://developer.ibm.com/hadoop/blog/videos/hadoop/

What is

https://developer.ibm.com/hadoop/blog/videos/hadoop/

Apache Spark

• Unified Analytics Platform

– Combine streaming, graph, machine learning and sql analytics on a single platform

– Simplified, multi-language programming model

– Interactive and Batch

• In-Memory Design

– Pipelines multiple iterations on single copy of data in memory

– Superior Performance

– Natural Successor to MapReduce

9

Fast and general engine for

large-scale data processing

Spark Core APIR Scala SQL Python Java

Spark SQL Streaming MLlib GraphX

Spark Technology Center

Commit 3000+ IBM engineers and researchers to Spark projects

Spark as an Optimized Analytics Engine

10Extract from slideshare: http://www.slideshare.net/databricks/spark-summit-san-francisco-2016-matei-zaharia-keynote-apache-spark-20

11

Open Data Platform InitiativeHarmonize on ODPi

Runtime certification released – a technology sandbox and test suites

IBM Open Platform certification expected as part of v4.2

http://odpi.org/

Goal: Achieve standardization and interoperability of software from ODP members

Doubledmember companies

35 technical maintainers

http://odpi.org/

12

Be more right, more often

What didI learn,

what’s best?

Cognitive

What action should I take?

Decisionmanagement

Why did it happen?Reporting,

analysis, content analytics

What could happen?Predictive analytics

and modeling

What is happening?Discovery and

exploration

Descriptive

Diagnostic

Predictive

Prescriptive

Technology enables insight in real time

8 Highways per core Compression over 90 % Massive Memory Bandwidth

inMemory Acceleration

• Realtime equals Trusted data(no more spreadsheets) gives you the ability to expand your buisiness, technology is the enabler through the Optimised Bluestack.

• Improved reporting starts the, Hyper Optimisation of Data, the great thing is its virtually automated

• The goal is unleashing all the latent insight across all your companies data

14

Single vendor support

Up to 2x better price performance for Spark workloads*

Delivered as a fully integrated cluster ready to run

OpenPOWER innovation with IBM S812LC servers

IBM Data Engine for Hadoop and Spark

Optimized configurations for Hadoop or Spark workloads

Based on S812LC servers with up to 14*6TB disk drives per server

Optionally preloaded with IBM BigInsights and IBM Open Platform

Simplify operations – easy to deploy and manage

Adapt and scale to your changing analytics needs

OpenPOWER innovation with IBM Open Platform with Apache Hadoop for a high performance, storage dense and fully integrated cluster offering.

• All results are based on IBM Internal Testing of 3 SparkBench benchmarks

consisting of SQL RDD Relation, Logistic Regression, SVM

Is Complete Insight in real time of value?

1.Contact me [email protected] your local IBM Team or Business Partner3.There is an SME to help you in your country

mailto:[email protected]

16

Q&A

david chancellor- maddison - ibm · market trends •forrester estimates that 100% of all large...

Documents