1_mi_is_az_a_big_data.ppt
DESCRIPTION
Big data presentationTRANSCRIPT
![Page 1: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/1.jpg)
© 2013 IBM CorporationJanuary 2013
IBM Big Data Platform Overview
Martin Pavlík+420 731 435 [email protected]
![Page 2: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/2.jpg)
© 2012 IBM Corporation2
Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data
Cost effectively manage and analyzeall available data in its native form
unstructured, structured, streaming
ERPCRM RFID
Website
Network Switches
Social Media
Billing
![Page 3: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/3.jpg)
© 2012 IBM Corporation3
BIG DATA is not just HADOOP
Manage & store huge volume of any data
Hadoop File SystemMapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all data sources
Integration, Data Quality, Security, Lifecycle Management, MDM
Understand and navigate federated big data sources Federated Discovery and Navigation
![Page 4: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/4.jpg)
© 2012 IBM Corporation4
Business-Centric Big Data Enables You to Start With a Critical Business Pain and Expand the Foundation for Future Requirements
“Big data” isn’t just a technology—it’s a business strategy for capitalizing on information resources
Getting started is crucial
Success at each entry point is accelerated by products within the Big Data platform
Build the foundation for future requirements by expanding further into the big data platform
![Page 5: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/5.jpg)
© 2012 IBM Corporation5
1 – Unlock Big DataCustomer need
• Understand existing data sources
• Search and navigate data within existing systems
• No copying of data
Value statement• Get up and running quickly
• Discover and retrieve big data
• Work even with big data sources – by business users
Solution• Vivisimo Velocity renamed to
• IBM InfoSphere DataDiscovery
![Page 6: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/6.jpg)
© 2012 IBM Corporation6
2 – Analyze Raw DataCustomer need
• Ingest data as-is into Hadoop• Combine it with data from DWH
• Process very large volume of data
Value statement• Gain new insight
• Overcome the high cost of converting data from unstructured to structured format
• Experiment with analysis on different data and combine them with other sources
Solution• IBM InfoSphere BigInsights
![Page 7: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/7.jpg)
© 2012 IBM Corporation7
Merging the Traditional and Big Data Approaches
ITStructures the data to answer that question
ITDelivers a platform to enable creative discovery
Business Explores what questions could be asked
Business UsersDetermine what question to ask
Monthly sales reportsProfitability analysisCustomer surveys
Brand sentimentProduct strategyMaximum asset utilization
Big Data ApproachIterative & Exploratory Analysis
Traditional ApproachStructured & Repeatable Analysis
![Page 8: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/8.jpg)
© 2012 IBM Corporation8
InfoSphere BigInsights is more than just HADOOP
IBM InfoSphere Big Insights• Is much more than
HADOOP
IBM Big data platform• Includes much more than
IBM InfoSphere Big Insights
![Page 9: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/9.jpg)
© 2012 IBM Corporation9
Hadoop Open-source software framework from Apache Inspired by
– Google MapReduce– GFS (Google File System)
HDFS Map/Reduce
![Page 10: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/10.jpg)
© 2012 IBM Corporation10
InfoSphere BigInsights Platform for volume, variety,
velocity Enhanced Hadoop foundation
Analytics Text analytics & tooling Application accelerators
Usability Web console Spreadsheet-style tool Ready-made “apps”
Enterprise Class Storage, security, cluster
management
Integration Connectivity to Netezza, DB2,
JDBC databases, etc
ApacheHadoop
Basic Edition
Enterprise EditionLicensed
Application accelerators Pre-built applications
Text analytics Spreadsheet-style tool
RDBMS, warehouse connectivity Administrative tools, security
Eclipse development toolsPerformance enhancements
. . . .
Free download
Integrated installOnline InfoCenter
BigData Univ.
Breadth of capabilities
Ente
rpris
e cl
ass
Can run also on top of
![Page 11: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/11.jpg)
© 2012 IBM Corporation11
Spreadsheet-style Analysis Web-based analysis
and visualization
Spreadsheet-like interface – Define and manage
long running data collection jobs
– Analyze content of the text on the pages that have been retrieved
![Page 12: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/12.jpg)
© 2012 IBM Corporation12
Build a Big Data Program – MapReduce exampleEclipse tools
For Jaql, Hive, Pig Java MapReduce, BigSheets plug-ins, text analytics, etc.
![Page 13: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/13.jpg)
© 2012 IBM Corporation13
JAQL – IBM’s programming language in hadoop world Jaql is a complete solutions environment supporting all other
BigInsights components Integration point for
various analytics– Text analytics– Statistical analysis– Machine learning– Ad-hoc analysis
Integration point for various data sources– Local and distributed
file systems– NoSQL data bases– Content repositories– Relational sources
(Warehouses, operational data bases)
Big
Insi
ghts
Te
xt A
naly
tics
Stat
istic
al
Ana
lysi
s (R
mod
ule)
Mac
hine
le
arni
ng
(Sys
tem
ML)
Ad-
Hoc
an
alys
is
(Big
Shee
ts)
(Inte
grat
ion)
D
B2,
Net
ezza
, St
ream
s, …
Jaql
Jaql I/O Jaql Core Operators
Jaql Modules
DFS NoSQL RDBMS File System
![Page 14: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/14.jpg)
© 2012 IBM Corporation14
BigInsights
Data warehouse
Traditional analytic tools Big Data
analytic applications
Filter Transform Aggregate
BigInsights and the data warehouse
![Page 15: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/15.jpg)
© 2012 IBM Corporation15
3 – Simplify your warehouseCustomer need – SIGNIFICANTLY
• Make performance of DWH better• Reduce DWH administration costs
Value statement• Speed: 10 – 100x better performance• Simplicity: Administration costs reduced by 75% - 90%• Scalability• Smart system
• In-database analytics• Out-of-the box integration with SPSS
Solution• IBM Netezza renamed to
• PureData System for Analytics
![Page 16: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/16.jpg)
© 2012 IBM Corporation16
Analyst IT
I need to evaluate the possible relationship between client salary and
overdrafts
OK. We have to evaluate a lot of statistics, set the correct db indexes and db partitioning. It will take us 5
days.
![Page 17: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/17.jpg)
© 2012 IBM Corporation17
Analyst IT
Great. Thanks a lot.I’m going to check the results.
Done. You can run your analytical query.
![Page 18: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/18.jpg)
© 2012 IBM Corporation18
Analyst IT
Great. I can see here some nice correlations. Now I need to look at it from the different perspective.
Ohhh, welcome dear friend. Understand. So, it’s ….
another 5 days of our work
Noooo!!!It’s not possible to work
here!
![Page 19: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/19.jpg)
© 2012 IBM Corporation19
And now with Netezza ...
![Page 20: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/20.jpg)
© 2012 IBM Corporation20
Analyst IT
I need to evaluate the possible relationship between client salary and
overdrafts.I will use Netezza.
![Page 21: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/21.jpg)
© 2012 IBM Corporation21
Analyst IT
Great. I can see here some nice correlations. Now I need to look at it from the different
perspective.With Netezza I can run the query immediately.
The response will be in the same time
IT can do something else – much more useful
![Page 22: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/22.jpg)
© 2012 IBM Corporation22
![Page 23: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/23.jpg)
© 2012 IBM Corporation2323
Built-In Expertise Makes This as Simple as an Appliance
Dedicated device
Optimized for purpose
Complete solution
Fast installation
Very easy operation
Standard interfaces
Low cost
![Page 24: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/24.jpg)
© 2012 IBM Corporation24
IBM Netezza was renamed to IBM PureData System for Analytics
In October 2012
![Page 25: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/25.jpg)
© 2012 IBM Corporation25
Netezza Genesis in T-Mobile CZ
Proof-Of-Concept Project– New EnterpriseDataWarehouse platform selection– Comparison of existing and other platforms
– Selection Criteria• Performance• Operational Savings
….and the winner was: Netezza
![Page 26: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/26.jpg)
© 2012 IBM Corporation26
Netezza Genesis in T-Mobile CZExpectations– Significant response improvement:
• Faster platform means better reports response
– Direct Data Availability• Higher trust in data , one version of truth• Aggregation reduction• Any attribute available
– Operational Benefits• Storage savings (no data replicas)• Administration costs reduction(DBA)
– Infrastructure Simplification• Lower environment complexity
![Page 27: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/27.jpg)
© 2012 IBM Corporation27
Netezza Genesis in T-Mobile CZ
Project Implementation
– EDW platform migration• Netezza platform implementation• ETL graphs/processes redesign
– BI Front-End Tool Migration• SAP Business Object implementation• All reports redesign
Main Integration Partner: T-System CZ
![Page 28: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/28.jpg)
© 2012 IBM Corporation28
Netezza Genesis in T-Mobile CZ
Actual Status
– All relevant ETL procecessing redesigned
– Actual parallel run to Original and Netezza platform finished
– Netezza as only primary platform
![Page 29: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/29.jpg)
© 2012 IBM Corporation29
Original Platform
Netezza
Workflow Reporting 2 hours 1 minute
Invoicing and Payments reporting
Payment discipline of current month invoices 33 minutes 17 seconds
Overdue Debt of Invoices – in Current Month 10 hours 23 seconds
Average Monthly Invoice Figures 50 minutes 38 seconds
RESPONSE TIME MASSIVELY IMPROVED
Real Netezza experience from T-Mobile Czech Rep.
![Page 30: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/30.jpg)
© 2012 IBM Corporation30
4 – Reduce costs with HadoopCustomer need – SIGNIFICANTLY
• Too much data => Too expensive to store and to maintain• Big portion is used “just in case”• Data amount is still growing => it’s more expensive
• => too expensive to have all data in standard DWH
Value statement• Leverage the architecture of parallel processing in Hadoop
• Hadoop uses cheap commodity HW
• Enable business users still work in the same or similar way
Solution• IBM InfoSphere BigInsights
![Page 31: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/31.jpg)
© 2012 IBM Corporation31
BigInsights and the data warehouse
BigInsights
• Query-ready archive for “cold” warehouse dataData Warehouse
Big Data analytic applications
Traditional analytic tools From Cognos BI
via Hive JDBC
![Page 32: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/32.jpg)
© 2012 IBM Corporation34
Application
SQL interface Engine
InfoSphere BigInsights
HiveTables HBase tables
CSV Files
Data Sources
SQL Language
JDBC / ODBC Driver
JDBC / ODBC Server
Future: The SQL interface . . . . Rich SQL query capabilities
– SQL '92 and 2011 features– Correlated subqueries– Windowed aggregates
SQL access to all data stored in InfoSphere BigInsights
Robust JDBC/ODBC support
Take advantage of key features of each data source
Leverage MapReduce parallelism
ORachieving low-latency
![Page 33: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/33.jpg)
© 2012 IBM Corporation35
5 – Analyze Streaming DataCustomer need
• Process and leverage streaming data
• Select valuable data from data stream for future processing
• Quickly process data going to be useless if it’s not processed immediately
Value statement• React in real-time to take an oppurtinity
before it expires
• Periodically adjust streaming models based on analysis on data at rest
Solution• IBM InfoSphere Streams
Streams ComputingStreaming Data
Sources
ACTION
![Page 34: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/34.jpg)
© 2012 IBM Corporation36
Why and when to use InfoSphere Streams?
Sensors Environmental, Industrial, GPS, … Images, Videos, …
Data Exhaust Network data system logs (web server, app server), …
High-rate transaction data Financial transactions CDRs
Isolation Processing in isolation … or in limited windows (time / nr. Of records)
Non-traditional formats included Spatial data, images, text, voice, …
Integration challenges Different connection methods Different data rates Different processing requirements
Multiple processing nodes Volume / rate very high => scalability required
Sub-millisecond latency Immediate analysis and response
Store & mine approach doesn’t work Because of very high volume of data (and its rates)
At least 2 criteria from the list bellow should be fulfilled
Applications needing on-fly processing, filtering and analyzing streaming data
![Page 35: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/35.jpg)
© 2012 IBM Corporation38
Streams and BigInsights - Integrated Analytics on Data in Motion & Data at Rest
1. Data Ingest
Data Integration, data mining, machine learning, statistical modeling
Visualization of real-time and historical insights
3. Adaptive Analytics Model
Data ingest, preparation, online analysis, model validation
Data
2. Bootstrap/Enrich
Control flow
InfoSphereBigInsights, Database & Warehouse
InfoSphereStreams
![Page 36: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/36.jpg)
© 2012 IBM Corporation39
The Platform Advantage
BI / Reporting
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictive Analytics
Content Analytics
Analytic Applications
IBM Big Data Platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
HadoopSystem
Stream Computing
Data Warehouse
BENEFITS IN DETAIL
Increase overtime
By moving from entry to a 2nd and 3rd project
Lowering deployment costs
Shared components
Integration
Points of leverage Shared text analytics for Streams and BigInsights
HDFS connectors (data integration (ETL, …), Streams)
Accelerators Build across multiple
engines
![Page 37: 1_Mi_is_az_a_big_data.ppt](https://reader036.vdocuments.us/reader036/viewer/2022070419/55cf9b35550346d033a524b8/html5/thumbnails/37.jpg)
© 2012 IBM Corporation40
IBM big data • IBM big data • IBM big data
IBM big data • IBM big data • IBM big data
IBM
big
dat
a
• IB
M b
ig d
ata
IBM
big data • IBM
big data
THINK