s002 smart analytics power tech
TRANSCRIPT
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 1/23
© 2010 IBM Corporation
IBM Power Systems Technical University
October 18–22, 2010 — Las Vegas, NV
Session Title: IBM Smart Analytics System
Speaker Name: Doug MackSession: s002
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 2/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
FastFlexible
Affordable
“Our prices are lower thanothers. Is this sustainable given
our costs, or a future threat?”
“…What is our risk
this morning?”
“…Are we using our stimulus
funding effectively?”
“…Do we have productissues or fraudulent claims
from service?”
“…How & when should we adjust
plans to reduce churn & expand
share?”
“…Which treatments areineffective and should be
eliminated to lower costs?
Analytics are cross industry
4000+ GBS Consultants with deep industry knowledge
Industry Specific Warehouse Packs for quick starts
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 3/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
AIIM & Accenture Surveys, 2007
47% of users
don’t have
confidencein their information
42% of managers
use wrong
information
at least once a week
59% of managers
miss
informationthey should have used
Unlocking the Business Value of Information
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 4/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
DFFTCA 3P 0DFRTBB 5ADFRTTB 5ADFMNTI 1ADFTG1B 1ADFTG2B 1ADFTG3B 1ADFTG4B 1ADFMNEE 25ADFMNEF 11P 2DFRERP 11P 2DFWELF 11P 2DFWILF 11P 2DFWILR 11P 2DFWILS 11P 2DFWILT 11P 2DFQI1W 5A
DFQ2IW 3ADFTRES 10ADFYT1LL 45ADFYT1LO 12ADFYT1LR 12ADFRRWA 5ADF6TYHA 1ADFTIIPQ 1P 0DFDRTF 6P 0DFDRTG 6P 0DFDRTH 6P 0DFTPPL 1P 0DFTINM 3P 0DFTIR2 30ADFTIGL 12ADFTTDT 6P 0DFTTED 6P 0DFHHIJ 4P 2DFHHIK 4P 2DFTYHI 5P 2DFTYIA 1ADFTYKN 1A
DFTTWK 1ADFTGHA 1ADFTGSS 2ADFTGPE 3ADFTGYI 5P 2
T00032P
DSFTCA 3P 0DSRTBB 5ADSRTTB 5ADSMNTI 1ADSVB1B 1ADSVB2B 1ADSYT1LO 50ADSYT1LR 12ADSRRWA 5ADS6TYHA 1ADSTIIPQ 3P 0DSDRTF 6P 0DSVBHA 1ADSVBSS 2ADSVBPE 3ADSVBYI 5P 2DSMNTI 25A
DSVR2B 25ADSVR3B 25ADSYT2WL 12ADSYTWLT 12ADSRRYUQ 6A
T01045P
KSFTCA 3P 0KSGSBB 5AKSGDMB 5AKSMARI 1AKSYT3LA 50AKSYT3LE 6P 0KSRRWA 5AKS6TYHA 1AKSTIIPQ 9P 0KSDGSF 6P 0KSVYHA 2AKSVFSS 2AKSVGTE 3P 0KSVUYI 5P 2KSMPTI 2AKSVR2B 2AKSVR3B 2A
KSYTBEL 10AKSYTPIT 10AKSRQAU1 5A
T01046P
AGFRCA 3P 0AGAC3EE 6P 0AGRRWA 5AAG6RYHA 1AAGR22PQ 9P 0AGDGSF 6P 0AGVYHA 14AAGVFSS 12AAGVGRE 3P 0AGVUY2 5P 2
AGMPR2 2AAGVR2B 2AAGVR3B 2AAGACBEE 1AAGACP2R 10AAGRQAU1 5AAGGSBB 1AAGGDMB 8AAGMAR2 1AAGAC3EA 50AAG6TTHA 1AAGRSAPQ 6P 0AGHISF 6P 0
R02126P
TLFTCA 3P 0TLRTBB 5ATLRTTB 5ATLTNT3 1ATLKB1B 1ATLKB2B 1ATLTNT3 25ATLKR2B 25ATLKR3B 25ATLPT2WL 12ATLPTWLT 12ATLRRPUQ 6A
T03140P
FPPTWLT 12AFPLLPUQ 6AFPFTCA 1P 0FPLTTB 5AFPTNTP 1AFPYB1B 1AFPTNTP 25AFPYL2B 1P 0FPYLPB 25A
T05001P
Getting INFORMATON out of operational databases can be problematic
Transaction Schema
Databases designed for TRANSACTION processing
Data elements designed by developers – not meaningful to analysts
Table/File relationships not well understood – need an expert
Transactions characterized by
– Many users
– Short CPU time required per use
– Small number of records accessed/updated per transaction
Analytical Queries characterized by
– Smaller number of users
– Intensive resource utilization (I/O)
– Large number of database records accessed
– Data loads and queries occuring at some time
Data may not be consolidated
Dirty Data
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 5/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
The Foundation for Analytics includes:
A Data warehouse is a repository of an organization'selectronically stored data. Data warehouses are designed to
facilitate reporting and analysis. An expanded definition for datawarehousing includes tools to extract, transform, and load(ETL) data into the repository, and tools to manage and retrievemetadata.
Business Intelligence is the name for the ability to act oninformation that may come from a data warehouse.
Data mining is a business process based on advancedtechnology, which finds unknown and complex relationships indata, producing insight into business issues and predictions to
improve business decisions.
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 6/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
IBM Smart Analytics System Schematic
InfoSphere
Warehouse
CubingCubing
ServicesServices
Cognos 8 BI
ETLETL
Operational Source SystemsStructured/ Unstructured Data
Data WarehouseData Warehouse
Data/Text Mining
ETL – Extract, Transform, and Load
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 7/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Data Warehouse Schema
INVOICE_NUMBER 7P 0
INVOICE_LINE_NUMBER 3P 0PRODUCT_NUMBER 5P 0
CUSTOMER_NUMBER 10ASELLING_COMPANY 5A
SUPPLY_WAREHOUSE 5A
QUANTITY_ORDERED 11P 0QUANTITY_SHIPPED 11P 0
TOTAL_DISCOUNT 9P 2
NET_PRICE 9P 2BASE_PRICE 9P 2
UNIT_COST 9P 2EXTENDED_COST 11P 2
EXTENDED_PRICE 11P 2MARGIN 11P 2
SALES_REP 5A
COMMISSION_VALUE 7P 2INVOICE_DATE DATE
SHIP_DATE DATEDELIVERY_DATE DATE
INVOICE_TIME TIMEMONTH_NUMBER 2P 0WEEK_NUMBER 2P 0
LOAD_DATE (DATE)
INVOICE_LINES
CUSTOMER_NUMBER 10ACUSTOMER_NAME 35A
ADDRESS_LINE_1 35A ADDRESS_LINE_2 35ACITY 35A
STATE_CODE 2A
ZIP_CODE 10ACONTACT_NAME 35ATELEPHONE 15A
SALES_REP_DEFAULT 5ACUSTOMER_CATEGORY 5A
CUSTOMER_CLASS 5A
REGION_CODE 5ALOAD_DATE DATE
LAST_CHANGE_TIME TMSTPSTATUS_FLAG 1A
CUSTOMERSPRODUCT_NUMBER 5P 0PRODUCT_DESCRIPTION 42A
BRAND_CODE 5ABRAND_DESCRIPTION 20A
ORIGIN_CODE 5A
ORIGIN_DESCRIPTION 20AFAMILY_CODE 5A
FAMILY_DESCRIPTION 20ACOST 9P 2
BASE_PRICE 9P 2
PRODUCT_WEIGHT 9P 4PRODUCT_VOLUME 9P 4
LOAD_DATE DATELAST_CHANGE_TIME TSTP
STATUS_FLAG 1A
PRODUCTS
Only includes thecolumns we care
about
Dates are truedate columns
Meaningfultable and
column names
De-normalized design
reduced to only 3tables
Complexcalculationsalready done
Very easy to work with
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 8/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Traditional approach to Data Warehousing and Business Intelligence
Evaluate Business Intelligence Tools from several vendors
Evaluate Database Management Systems from several vendors
Evaluate Extract/Transform/Load (ETL) tools from several vendors
Select Vendors (come to contract terms)
Size System
Develop Systems Integration Plan
Develop Operational Skills/Procedures
Build Multi-vendor Support Structure
Install, Configure, Tune
Tune again and again and again
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 9/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Compared to 7600
4x more cores per module
50% less space and energy requirements
2x storage capacity per data module
NEW: IBM Smart Analytics System 7700
Complete End to End Analytical Solution shipped in a manner of weeksversus months - reducing risk and improving time to value. Completely
integrated solution based on powerful warehouse infrastructure with a singlepoint of support
Start small and grow big with proven, flexible modular design that preservesyour investment and maintains optimized design as you grow
Exploit the latest POWER7 architecture
– Optimized for Analytics with Massively Parallel Processing design
– New IBM System Storage with Solid State Drive (SSD) options
– InfoSphere Warehouse 9.7 adds Oracle compatibility
1 Performance per data module, and cost comparing equivalently sized systems in Raw Data Terabytes
2x Performance at ½ the cost1
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 10/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Cognos Module
2x performance through optimization at build time
26-40% performance improvement over POWER6
High availability for BI built into system
POWER7 Cognos Module and Control Console
Optional Cognos module is refreshed with Power 740 8 core servers improving performanceand supporting higher number of concurrent users per module. Design of the Cognosmodule includes an active-active, workload balancing architecture that is preconfigured andoptimized by IBM before shipping
The 7700 control console provides another layer above the hardware and software that
allows administration over the system as a whole, rather than each component individually.This console allows the user to manage and maintain software for coordinated stackupdates (operating system, driver, firmware, and other components).
1 Based on complexity of queries comparing on a core to core basis
20-40% Performance Improvements1
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 11/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Faster Results Less Risk
June Jan.
5
4
3
2
Jan.
Build “from scratch” Pre-Built
Pre-implementationSystem Sizing
AcquireComponents
Installation &
Configuration
Testing & Validation
Months versus weeks
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 12/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Data Warehouse Software InfoSphere Warehouse
InfoSphere Warehouse Advanced WorkloadManagement
Tivoli System Automation
Analytics Software Options Business Intelligence Module (Cognos 8 BI) InfoSphere Warehouse Cubing Services InfoSphere Warehouse Text Analytics & Data Mining
Hardware/OS IBM Power 740 Servers (16 core modules)
IBM System Storage Designed for the IBM SmartAnalytics System including SSD acceleration AIX 6.1
IBM Smart Analytics System 7700What’s in the box?
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 13/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
IBM Smart Analytics System 7700Transparent Modular Architecture
FoundationModule
DataModule
User Module
Failover Module
+
+
1 Module 1 to x Nodes 0 to y Nodes 0 or (x+y+1)/5 Nodes
Foundation Structure
Add-On Modules
AnalyticsModule
BIModule
3rd PartyModules
WarehousePacks...
Modular design
+ SSD
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 14/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Scaling out to support more data or more usersShared Nothing, Massively Parallel Design
Foundation Module
User Module
Data Module
CPUCPU
MEM
CPUCPU
Data Module 4
CPUCPU
MEM
CPUCPU
Data Module 5
CPUCPU
MEM
CPUCPU
CPUCPU
MEM
CPUCPU CPUCPU
MEM
CPUCPU CPUCPU
MEM
CPUCPU
Data Module 1 Data Module 2 Data Module 3
Foundation Module
CPUCPU
MEM
CPUCPU
User Module 2
CPUCPU
MEM
CPUCPU
User Module 3
Legend
Expand by addingadditional user or
data modules
Balanced system design
– System modules with optimal processor, memory, and I/O specifications
Scale-out by adding additional system modules
– Which always include balanced I/O
Proven “best practice“ for large scale data warehousing
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 15/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
InfoSphere Warehouse is powered by DB2
DB2 offers many unique, industry leading capabilities that are
advantageous to data warehousing environments – “Shared Nothing” Architecture
– Advanced Cost Based Parallel Query Optimizer
– Flexible Partitioning
– Multi-dimensional Clustering (MDC) – Materialized Query Tables (MQT)
– Industry leading compression
– Workload management
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 16/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
InfoSphere
Warehouse
Cubing Services is a multidimensional analysis server that enables OLAP applications to access to largedata volumes stored inside a DB2 database
Benefits
Empowers users with ad hoc access to business information.
– What is the profitability for Product A across the Branches X,Y,Z?
Speed of thought access to OLAP data managed by DB2
OLAP and SQL shared access to the same information
Single point of management, maintenance, and performance tuning
Accessible via Cognos 8 BI, Microsoft Excel, Alphablox, IBM DataQuant, andCubeware Cockpit
Online Analytical Processing (OLAP)Cubing Services
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 17/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Data Mining & Visualization
InfoSphere Warehouse provides real time datamining embedded in your warehouse, accessible
from any application supporting SQL. Benefits
– Discover information on customer and product associations andbehaviors captured by your application
Examples
• Market basket analysis• Customer segmentation
• Click stream analysis
• Disease and treatment analysis
• Fraud detection
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 18/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Modeling & DesignInfoSphere Warehouse Design Studio
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 19/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Embedded Data Movement & Transformation
The SQL Warehousing Tool (SQW)
Easy to Use & Package
Graphically build complex transformations within DB2
Advanced Workflow Control and Scheduling
Build your Warehouse Source to Target as one Application
Integration
Automate Text Analytics & Data Mining workloads
Ability to Natively source data from non-DB2 RDBMS
Scale up and integrate with Information Server/DataStage
Compliance
Ability to add version management
Job Monitoring
InfoSphere Warehouse provides extract, load and transform (ELT) capabilities based on theDB2 server engine. SQL Warehousing Tool application are developed in a fully integratedgraphical interface within the InfoSphere Design studio.
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 20/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Oracle/Sun Database Machine (Exadata)
Exadata positioned as “it can do it all” – transaction processing and data warehousing –contrary to IBM’s “workload optimized systems”
IBM Internal Use Only
Designed with Oracle RAC shared disk clusters
Provides storage servers to Oracle database
– Linux on x86 running Oracle Enterprise Linux
For every Oracle database server there are twoadditional “Exadata” servers to perform the I/O,each with 12 disk drives and new Exadata software
– You have to license each disk drive in theExadata servers @ $10,000 each for a total of $1,680,000 for a Full Rack plus 22% annualmaintenance
3
2
¼ Rack
84Database Servers
147Exadata Storage Servers
Full Rack½ RackOracle Database Machine
10
11
12
1 2 3 4 5 6 7 8 9 10 11 12Data Base Nodes in Cluster
E f f e c t i v e
N o d e s
Oracle RACScalability
1
2
3
45
6
7
8
9
Perfect Linear
PerformanceEffectiveNodes
1.692.44
Productive Resources
Wasted Resources
Oracle RAC characteristics as shown in Dell RAC InfiniBand Studyhttp://www.dell.com/downloads/global/power/ps2q07-20070279-Mahmood.pdf
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 21/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV
Smart Analytics Systems Compared to Exadata
1 Data Volumes represent uncompressed user space
2 Oracle prices include RAC, Partitioning Option, Advanced Compression, Tuning and Diagnostic packs and includes 3 years maintenance and support
3 IBM prices based on preliminary 7700 list pricing and include 3 years SW and HW maintenance and support.
4 http://imcomp.torolab.ibm.com/wiki/images/f/f2/FullSolitaire2008report.pdf
$3.67M31 data module (12TB)½ Rack (14TB1) $5.15M2
$5.07M32 data modules(24TB)
Full Rack (28TB) $10.13M
ListSmart Analytics 7700Oracle Database Machine
Oracle prices do NOT include
– Installation and migration services
– Additional software, servers and storage required to add analytical applications thatare NOT included with Exadata
• No OLAP, Data or Text Mining
DB2 Compression could increase usable data space 3-5x
– DB2 Compression Option is included in the price of IBM Smart Analytics DB2 requires 60% of the staffing of Oracle
– Study of 400 Power Systems clients4
50% LESS
8/3/2019 s002 Smart Analytics Power Tech
http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 22/23
© 2010 IBM Corporation
IBM Power Systems Technical University — Las Vegas, NV