david chancellor- maddison - ibm · market trends •forrester estimates that 100% of all large...
TRANSCRIPT
David Chancellor-MaddisonEuropean Strategic Initiatives Leader for Systems
Thanks to : In Time
@[email protected]+447827084570
Agenda
2
• Market Analysis
• Customer Examples: Sony, video, Thames Water
• Data Building Blocks
• Hadoop
• Spark
• Analytics
• IDEHS
• Q&A
Market trends
• Forrester estimates that 100% of all large enterprises will adopt Hadoop and related technologies such as
Spark for big data analytics within the next two years. (01/19/2016)
• Spark will reinvigorate Hadoop and, in 2016, nine out of every 10 projects on Hadoop will be Spark-related
projects. (IBM MDI, 02/08/2016)
• Global Hadoop market is expected to garner revenue of $84.6B by 2021. Europe is anticipated to witness the
fastest CAGR of 65.7% during the forecast period (2016 - 2021). (Allied Market Research, 01/2016)
IDC: 53.5% are actively using Spark for Hadoop data.
4
IDC# US41157616 (April, 2016) Source: IDC Hadoop and NoSQL Database Survey, N=219, March 2016
37.1
24.8
15.2
13.5
9.4We use MapReduce for all Hadoopprocessing (note: this includes Hive).
We have been using MapReduce, butare putting all new workloads onSpark.
We are migrating our MapReduceworkloads to Spark.
We only use Spark for Hadoop dataprocessing.
We use some other DBMS on top ofHadoop for most of our dataprocessing (e.g., Splice Machine)
Use DBMS
on Hadoop
53.5% using
Spark
37.1% Just use
Mapreduce
Sony Oceans of Data but no insight!
Insight is lost revenue if you cannot get it in time.
Re-run report due to data anomaly
Can Eli learn and then Help?(Vid)
Thames Water Customer dissatisfaction impacts revenue
Data Building Blocks
7
Unstructured Data Structured Data
There are 2 kinds of Data: Give me some Examples
All Data Business Outcomes
Apache Hadoop• Hadoop design features
– Large single file system
– Optimized for Batch and streaming reads of large files
• What is Hadoop Good at?
– Data Consolidation,repository
– Cost Reduction
– 360 view of customer, data exploration
• What is Hadoop bad at?
– Audience please tell me
– FB
– How did they fix it?
For more information about Hadoop, go to: https://developer.ibm.com/hadoop/blog/videos/hadoop/
What is
Apache Spark
• Unified Analytics Platform
– Combine streaming, graph, machine learning and sql analytics on a single platform
– Simplified, multi-language programming model
– Interactive and Batch
• In-Memory Design
– Pipelines multiple iterations on single copy of data in memory
– Superior Performance
– Natural Successor to MapReduce
9
Fast and general engine for
large-scale data processing
Spark Core APIR Scala SQL Python Java
Spark SQL Streaming MLlib GraphX
Spark Technology Center
Commit 3000+ IBM engineers and researchers to Spark projects
Spark as an Optimized Analytics Engine
10Extract from slideshare: http://www.slideshare.net/databricks/spark-summit-san-francisco-2016-matei-zaharia-keynote-apache-spark-20
11
Open Data Platform InitiativeHarmonize on ODPi
Runtime certification released – a technology sandbox and test suites
IBM Open Platform certification expected as part of v4.2
http://odpi.org/
Goal: Achieve standardization and interoperability of software from ODP members
Doubledmember companies
35 technical maintainers
12
Be more right, more often
What didI learn,
what’s best?
Cognitive
What action should I take?
Decisionmanagement
Why did it happen?Reporting,
analysis, content analytics
What could happen?Predictive analytics
and modeling
What is happening?Discovery and
exploration
Descriptive
Diagnostic
Predictive
Prescriptive
Technology enables insight in real time
8 Highways per core Compression over 90 % Massive Memory Bandwidth
inMemory Acceleration
• Realtime equals Trusted data(no more spreadsheets) gives you the ability to expand your buisiness, technology is the enabler through the Optimised Bluestack.
• Improved reporting starts the, Hyper Optimisation of Data, the great thing is its virtually automated
• The goal is unleashing all the latent insight across all your companies data
14
Single vendor support
Up to 2x better price performance for Spark workloads*
Delivered as a fully integrated cluster ready to run
OpenPOWER innovation with IBM S812LC servers
IBM Data Engine for Hadoop and Spark
Optimized configurations for Hadoop or Spark workloads
Based on S812LC servers with up to 14*6TB disk drives per server
Optionally preloaded with IBM BigInsights and IBM Open Platform
Simplify operations – easy to deploy and manage
Adapt and scale to your changing analytics needs
OpenPOWER innovation with IBM Open Platform with Apache Hadoop for a high performance, storage dense and fully integrated cluster offering.
• All results are based on IBM Internal Testing of 3 SparkBench benchmarks
consisting of SQL RDD Relation, Logistic Regression, SVM
Is Complete Insight in real time of value?
1.Contact me [email protected] your local IBM Team or Business Partner3.There is an SME to help you in your country
16
Q&A