hot technologies of 2012
TRANSCRIPT
H T Technologies 2012
HOST: Eric Kavanagh
THIS YEAR WAS…
ANALYTIC PLATFORMS
� Analytic Platforms represent the next major phase in the evolution of Business Intelligence and Analytics
� These platforms should foster collaboration and transparency
� Users should be enabled to access and analyze the data they want, quickly and effectively
ANALYST:
Mark Madsen CEO, Third Nature Inc.
ANALYST:
John O’Brien Principal & CEO, Radiant Advisors
GUEST:
Walter Maguire Director of Analytics, ParAccel TH
E LINE UP
INTRODUCING
Mark Madsen
© Third Nature Inc.
Philosophical ques.on
When modeling a data warehouse, is it best to:
A. Choose each data element in your schema based on usefulness /usage
or B. Keep every element in the source data?
© Third Nature Inc.
.
It would be logical to keep all the data in one place.
I need that data now.
The common situa.on for analysts
It will take 6 months
© Third Nature Inc.
Analy.cs embiggens the data volume problem
Many of the processing problems are O(n2) or worse, so even moderate data can be a problem for DW models & architectures
© Third Nature Inc.
Big changes for data warehousing workloads
Much of the analyHcs data is being read, wriIen and processed interacHvely with people waiHng, or in real Hme machine to machine contexts. The results of analyHc processing can – oMen do – feed back into the system from which they originate. Our DW design point was not changing tables, ephemeral paIerns, large data movement – it was a pub-‐sub model.
© Third Nature Inc.
What do we mean by analy.cs pla?orm?
AnalyHcs<> BI, different usage model and workload Deployment environment? What sort? ▪ Batch ▪ Real Hme
Development or exploraHon environment? For what? ▪ The process of model building ▪ Exploratory analysis ▪ AnalyHc data management
A real analytics production workflow Hatch, CIKM 2011
© Third Nature Inc.
Analy.c pla?orm design goals
1. Decouple the analyHc plaXorm from the data warehouse: it can be a part of the delivery layer, or the integraHon layer, or both.
2. Support the analyHc development and maintenance processes, preferably without unsupported data copying.
3. Support the producHon deployment processes.
Don’t try to force-‐fit “offload” and “merge” paIerns. To the extent you can do all of this without moving data around, it’s a big win.
© Third Nature Inc.
Be suspicious of anyone who says Hadoop is the only answer
© Third Nature Inc. © Third Nature Inc.
IT reality is mul.ple data stores, distributed pla?orm Separate, purpose-built databases and processing systems for different types of data and query / computing workloads is the new norm for information delivery. Delivery must be separated.
Informa.on delivery layer
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
1 Marge Inovera $150,000 Statistician2 Anita Bath $120,000 Sewer inspector3 Ivan Awfulitch $160,000 Dermatologist4 Nadia Geddit $36,000 DBA
Data Warehouse
Databases Documents Flat Files XML Queues ERP Applications
Source Environments
Pla
tform
laye
r
INTRODUCING
John O’Brien
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
ROLE OF ANALYTIC DBMS IN MODERN BI ARCHITECTURES
Hot Technologies – December 5, 2012 John O’Brien, Radiant Advisors [email protected]
16
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
ROLE OF ANALYTIC DATABASES Modern BI Architectures
Data persistence for optimized BI workloads • 2-tier versus 3-tier debate
• Why 3-tier will be next generation
Integrating semantics “in” or “above” data • Cross database versus data virtualization debate
• Why a evolving combination will be next generation
Predictions for 2013 and 2014
17
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
MIXED WORKLOAD CAPABILITIES
18
Modern BI Architectures
Key Value Store (Hadoop) Discovery Oriented
Highest Scalability
Lowest Cost Schema-less
Without Context
Analytic Database
Technologies
EDW RDBMS
Accessibility: Programming SQL, MDX, UDF SQL Access Workload: Flexible, Scalable Analytic Optimized Reference Data Mgmt Maturity: Emerging Accepted Mature
3-Tier BI Architecture
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
MIXED WORKLOAD CAPABILITIES
19
Modern BI Architectures
Highest Scalability Lowest Cost
Flexibility Schema-less
Without Context
Analytic Workloads
EDW RDBMS
Hadoop Programming Batch Oriented
What’s not to like about this? • While possible, analytic execution will be slower performing and more
time consuming to develop and manage in Hadoop stores for BI teams • Broad accessibility of BI tools will be a limitation
2-Tier BI Architecture
Broad SQL Accessibility by Users
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
SEMANTIC INTEGRATION AT DATA
20
Modern BI Architectures
Hadoop
Analytic DBMS Columnar storage In-memory access Document stores
Text Analysis Graph Analysis ROLAP/MOLAP
EDW (RDBMS)
SQL
ç
ç
ç ç
text
Integration H
Cat
alog
/ H
ive-
QL
Links Gateways
BI tools (today)
Know when pulling data into ADBMS is ok
Map
Red
uce
ç
Semantic Projections
Semantic Discovery
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
SEMANTIC INTEGRATION ABOVE DATA
21
Modern BI Architectures
MapReduce
HCatalog
Hadoop Analytic DBMS EDW
SQL / Data Virtualization ç
ç ç
ç
Future BI tools Where should semantic knowledge live in the architecture?
Semantic Discovery
text ç In-memory
ç
ç
Services
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
THINGS TO KEEP AN EYE ON Modern BI Architectures
1. Expect modern BI architectures to evolve in coming years as technologies pave the way
2. Adoption of R and PMML for analytic models to become portable across platforms
3. How vendors push-down execution code in Hadoop or pull-through data into analytic databases
4. Polyglot persistence will optimize on multiple storage engines with service layer access
22
INTRODUCING
Walter Maguire
Enabling Big Data ApplicaHons Walter Maguire, Director of AnalyHcs
Copyright 2012 ParAccel, Inc. 24
ParAccel Analy.c Pla?orm is…
Copyright 2012 ParAccel, Inc.
…built for high performance, interac.ve analy.cs.
Integrated Analy.cs
Basic AnalyHcs
Advanced AnalyHcs
On Demand Integra.on
Database
Teradata
Hadoop
Streaming Data
ApplicaHons
Parallel Processing
Data Scale
AnalyHc Scale
User Scale
InteracHve Scale
ParAccel Analy.c Pla?orm
Analy.c Engine
Columnar
Compression
Compiled
SQL OpHmizaHon
Plan OpHmizaHon
ExecuHon OpHmizaHon
Comms OpHmizaHon
I/O OpHmizaHon
In-‐Memory Op.on Available
25
ParAccel technology is the first to deliver on Coopera.ve Analy.c Processing
SQL-‐Based Business Intelligence and Repor.ng Tools
Advanced Analy.cs
Analy.c Applica.ons
Machine Data
Opera.onal Data
3rd Party Info
Provider
Streaming Data Logs
ParAccel Analy.c Pla?orm
On Demand Integra.on
Enterprise Data Warehouse
Hadoop
Big Data Apps
Embedded Analy.cs
Copyright 2012 ParAccel, Inc. 26
ParAccel ODI Services makes our pla?orm the analy.c engine for en.re ecosystems.
1. Share both data and processes in both direcHons 2. Transform incoming data for analyHc performance 3. Interact with many programming languages (Java, Python, more) 4. Persist or stream data through analyHc processing 5. Rapidly build new On Demand IntegraHon modules
Machine Data
Opera.onal Data
3rd Party Info
Provider
Streaming Data Logs
ParAccel Analy.c Pla?orm
On Demand Integra.on Services
Enterprise Data Warehouse
Hadoop
Big Data Apps
Embedded Analy.cs
Copyright 2012 ParAccel, Inc. 27
One Size Does Not Fit All: Why an Ecosystem?
ReporHng
Dashboards
StaHc Analysis
OLAP
AnalyHcs
Data Mining
Dynamic Analysis
Complexity
Archiving
Filtering
Text Search
Text AnalyHcs
TransformaHon
Copyright 2011 ParAccel, Inc. 28
The Best Way to Do Analy.cs on Hadoop Data
Create a high-‐performance, node-‐to-‐node, bi-‐direcHonal, connecHon between Hadoop and an analyHc plaXorm that is capable of sharing both data and processes so that the analyHc plaXorm becomes an extension of the Hadoop cluster and you can uHlize the lingua franca of analyHcs, SQL.
Copyright 2012 ParAccel, Inc. 29
30 Copyright 2012 ParAccel, Inc.
Read from Hadoop: INSERT INTO mytable SELECT * FROM HadoopIn(with hfs_name(‘hadoopfile’) mr_job(‘xyz’) pa_schema(‘mytable’));
Write to Hadoop: SELECT num_rows FROM HadoopOut(on (select * from mytable) WITH hdfs_name( ‘hadoopfile’));
What’s Next for the Hadoop ODI? HCatalog Integra.on
• Apache HCatalog is a table and storage management layer for Hadoop Provides table abstracHon for HDFS file for various data processing tools
• ODI Scan filters UDF Filters from the SQL will be pushed down to Hadoop as parHHon filters
Greatly simplify invesHgaHve workflow on large volumes of data in Hadoop before bringing it into ParAccel
Simplify development of Hadoop to ParAccel integraHons
Copyright 2012 ParAccel, Inc. 31
ODI Services Architecture Overview
Leader Node
ODI Services Service Mgmt. Service Context
Compute Node
ODI
Services
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
ODI Services Architecture Overview
Leader Node
ODI Services Service Mgmt. Service Context
Compute Node
ODI
Services
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
• Job Progress & Status • Installation • Logging • Balancing • Optimization
ODI Services Architecture Overview
Leader Node
ODI Services Service Mgmt. Service Context
Compute Node
ODI
Services
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
• Job Progress & Status • Installation • Logging • Balancing • Optimization
STDIN STDOUT STDERR Metadata Mgmt Framework
ODI Services Architecture Overview
Leader Node
ODI Services Service Mgmt. Service Context
Compute Node
ODI
Services
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
Compute Node
ODI
Services
Perl Python Java Bash R Etc.
• Job Progress & Status • Installation • Logging • Balancing • Optimization
• Command line executable
• 3rd party interpreter (e.g. Perl, Python, Java VM)
STDIN STDOUT STDERR Metadata Mgmt Framework
Developing and Deploying ODIs
Write command line executable or interpreted script
Test with ODI Services test harness
Load to lead node
Lead node distributes ODI across the compute nodes
Copyright 2011 ParAccel, Inc. 36
Developing and Deploying ODIs
o Enables a spectrum of use cases from fast prototyping to one-‐off and producHon data loads/unloads
o No need to code to C++ APIs or be exposed to any complexity
o Fast development
o Handles parallelism for you
o Simple protocol
o Logging
o Monitoring progress
Copyright 2011 ParAccel, Inc. 37
ODI services: examples
Event Capture
Smart Meter Logging
RFID Tag Capture
Tweets, Facebook, consolidated social streams
Web services (Salesforce, Eloqua, Omniture, etc.)
Enterprise Semi-‐Structured sources (Outlook, Gmail, Zendesk, etc.)
Embedded business processes (ex: call center, distribuHon rouHng)
Copyright 2011 ParAccel, Inc. 38
Coopera.ve Analy.c Processing is the Future
SQL-‐Based Business Intelligence and Repor.ng Tools
Advanced Analy.cs
Analy.c Applica.ons
Machine Data
Opera.onal Data
3rd Party Info
Provider
Streaming Data Logs
ParAccel Analy.c Pla?orm
On Demand Integra.on
Enterprise Data Warehouse
Hadoop
Big Data Apps
Embedded Analy.cs
Copyright 2012 ParAccel, Inc. 39
The Archive Trifecta: • Inside Analysis www.insideanalysis.com • SlideShare www.slideshare.net/InsideAnalysis • YouTube www.youtube.com/user/BloorGroup
THANK YOU!