in the mix: how native integration improves mapreduce and hadoop
TRANSCRIPT
Twitter Tag: #briefrTuesday, May 22, 12
Reveal the essential characteristics of enterprise software, good and bad
Provide a forum for detailed analysis of today’s innovative technologies
Give vendors a chance to explain their product to savvy analysts
Allow audience members to pose serious questions... and get answers!
Twitter Tag: #briefr
Tuesday, May 22, 12
May: Analytics
June: Intelligence
July: Governance
August: Analytics
September: Integration
October: Database
Twitter Tag: #briefr
Tuesday, May 22, 12
Twitter Tag: #briefr
Ultimately analytics is about businesses making optimal decisions, although the range of technologies that inhabit this area is wide: statistical analysis, data mining, process mining, predictive analytics, predictive modeling, business process modeling and complex event processing.
With the advent of big data, analytics has become “big analytics” with organizations diving into large heaps of data that previously was not available or usable.
A major challenge with this market trend is to be able to provide adequate performance for all BI and analytics workloads on the volumes of data that are now being assembled and which are continuously growing.
Tuesday, May 22, 12
Twitter Tag: #briefr
Robin Bloor is Chief Analyst at The Bloor Group.
Tuesday, May 22, 12
Twitter Tag: #briefr
SAP Sybase has a history of database innovation and application from the corporate RDBMS through to the mobile and embedded market.
Sybase IQ has been deployed in many areas of application and is used in many complex predictive analytics deployments, where speed data capacity and versatility are critical.
Recently it has been upgraded to be used in a symbiotic manner with Hadoop in order to provide a comprehensive capability as a BI and analytics engine for Big Data applications
Tuesday, May 22, 12
Twitter Tag: #briefr
David Jonker works in the area of Data Management & Analytics for SAP and is Product Marketing Director for Sybase IQ. In the last 5 years David has led product marketing teams for Sybase’s Data Management & Analytics product lines, including Sybase IQ, Sybase ASE, SQL Anywhere, and Advantage Database Server. His career includes over 10 years in software engineering and product management. Before joining Sybase, David had consulting, product management and software development roles.
Courtney Claussen is a product manager at Sybase, Inc., focusing on Sybase's data warehousing and analytics products. She has enjoyed a 30 year career in software development,
technical support and product marketing in the areas of computer aided design, computer aided software engineering,
database management systems, middleware, and analytics.
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 10
Sybase IQWidespread success
Stands out as the leading enterprise data warehouse among the largest banks, insurance agencies, and telecom operators worldwide
Manage and analyze statistical measures for the entire nation of Canada
Analyze complex models in more than 200 financial institutions worldwide
Analyze ALL Federal tax returns in the US
Store and analyze massive amounts of industry segment data in 30 of the largest information providers in the world, including Transunion, Nielsen and Axiom
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 11
Volume
Managing and harnessing
terabytes of data
SkillsLack of adequate
skills for non-standard platforms
and APIs
Variety
Harmonizing silos of structured and
unstructured data
Velocity
Keeping up with unpredictable data
and query flows
Costs
Too expensive to acquire, operate,
and expand
Dealing with volume, variety, velocity, costs, skills BIG DATA ANALYTICS ISSUES
BIG DATA
ANALYTICS
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 12
Big data analytics
Sybase IQ 15A powerful big data analytics platform in the making
v15.0
2009
VLDB Platform FoundationVolume
v15.4
2011
MapReduce APISkills
v15.3
2011
PlexQ™ MPP FoundationCosts
v15.2
2010
Text Search, Web 2.0 APIVariety
v15.1
2009
In-Database Analytics APIVelocity
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 13
Sybase IQ 15.4A comprehensive platform for big data analytics
SybaseCONTROL CENTER
SybasePOWERDESIGNER
CERTIFITEDISV TOOLS
Web 2.0 Java C/C++ SQL
DMBS
AppServices
Eco-SystemUnstructured Data(Hadoop, Content Mgmt)
Structured Data(DBMS)
Ingest + PersistFederation
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 15
No compromise for complex analyticsBasic to advanced analytical functions available to SQL directly from Sybase IQ engineData never leaves the database until results are materializedAnalytics code / models must be shareable yet must allow AD-HOC analysisAnalytics code / models must be applicable to the latest data setStandards based access, concept extensibility is compulsoryPerformance and scalability is a givenAverage developer must be able to build In-database analytical models
Sybase IQ Process
Built-‐In func6ons External DLL “A”
External DLL “A”
Database = Logic/Filtering
Applied in database
Analy7cs simplified: Logic To Data = Fast + Efficient
In-database analytics in Sybase IQ
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 17
In-database analytics in Sybase IQCustom functions APIs
Several different forms of C++ and JAVA UDF APIs for building custom In-database analytics, each valid at different locations within queries
1.{Scalar} to {Scalar functions} e.g. sin, cosine, …
2.{Scalar set} to {Scalar functions} e.g. max, min, …
3.{Scalar set} to {Scalar set} e.g. OLAP windows, …
4.{Scalar set} to {Tables} e.g. join result sets, …
5.{Scalar set, Tables} to {Tables} e.g. MapReduce, …
All variants are parallelizable, but (5) is also distributable across the PlexQ™ grid
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 18
• Ideal for ISV or custom Data Mining libraries for Healthcare, eCommerce, Public SectorApps include: – ISV partner Zementis built a plug-in
for PMML (Predictive Modeling Markup Language) models– Validates PMML from SAS, R,..– Translates PMML to JAVA UDFs– JAVA UDFs called from SQL
In-database analytics in Sybase IQ Java custom functions
Big Data Use Cases
•External algorithms written as JAVA fns, plugged into Sybase IQ
•JAVA fns via SQL: runs In-Database, much faster than client side
•JAVA fns run protected/fault tolerant (in separate process)
•Supports scalar and table outputs•Supports all data types
CharacteristicsJAVA User
Defined Function offers a new in-
database analytics API
Feature3
Sybase IQ
Plug-In
JAVA UDF
PMML Zementis
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 19
• Ideal for bringing together Big Data Analytics pre-computations from different domains
• Example — In Telecommunication: Sybase IQ with aggregated customer loyalty data & Hadoop with aggregated network utilization data; Quest Toad for Cloud can bring data from both sources, linking customer loyalty to network utilization or network faults (e.g. dropped calls)
App services — integrating Sybase IQ + Hadoop: at client side
SYBASE IQ 15.4 DECONSTRUCTED
Big Data Use Cases
•Client tool capable of querying Sybase IQ and Hadoop
•Currently certified client tool is Quest Toad for Cloud
•Better performance when results from sources are pre-computed/pre-aggregated
CharacteristicsClient side
federation: Join data from
Sybase IQ AND Hadoop at a client application level
Feature6a
Sybase IQ
$
Toad for Cloud Databases
Hadoop Hive
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 20
• Ideal for combining subsets of HDFS unstructured data or summary of HDFS data into Sybase IQ for mid to long term usage in business reports
• Example — In eCommerce: clickstream data from weblogs stored in HDFS and outputs of MR jobs on that data (to study browsing behavior) ETL’d into Sybase IQ. The transactional sales data in Sybase IQ joined with clickstream data to understand and predict customer browsing to buying behavior
Big Data Use Cases• Extract & load subsets of HDFS data into Sybase IQ column store–Raw data from HDFS–Results of Hadoop MR jobs
• HDFS data stored in Sybase IQ is treated like other Sybase IQ data–Gets ACID properties of a DBMS–Can be indexed, joined, parallelized–Can be queried in an ad-hoc way
• Visible to BI and other client tools via Sybase IQ ANSI SQL API only
• Currently, the Apache bulk data transfer utility SQOOP (built by Cloudera) is certified to provide this ETL capability
CharacteristicsLoad Hadoop
data into Sybase IQ column store: Extract, transform,
load data from HDFS (Hadoop Distributed File System) into Sybase IQ schemas
Feature
App services — integrating Sybase IQ + Hadoop: using ETL
SYBASE IQ 15.4 DECONSTRUCTED
6b
ETL
SQOOP
Clickstream Data
Sales Data
Sybase IQHDFS
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 21
• Ideal for combining subsets of HDFS data with Sybase IQ data for operational (transient) business reports
• Example — In Retail: Point Of Sale (POS) detailed data stored in HDFS. Sybase IQ EDW fetches POS data at fixed intervals from HDFS of specific hot selling SKUs, combines with inventory data in Sybase IQ to predict and prevent inventory “stockouts”
Big Data Use Cases• Scan and fetch specified data subsets from HDFS via table UDF–Can read and fetch HDFS data
subsets–Called as part of Sybase IQ SQL
query–Output joinable with Sybase IQ data
• HDFS data not stored in Sybase IQ–Fetched into Sybase IQ In-memory
tables–ACID properties not applicable
• Visible to BI/other client tools via Sybase IQ ANSI SQL API
CharacteristicsJoin HDFS data with Sybase IQ data on the fly: Fetch and join
subsets of HDFS data on-demand
using SQL queries from Sybase IQ
(Data Federation technique)
Feature
App services — integrating Sybase IQ + Hadoop: using Data Federation
SYBASE IQ 15.4 DECONSTRUCTED
6c
POS Data Inventory Data
Sybase IQUDF BridgeHDFS
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 22
App services — integrating Sybase IQ + Hadoop: using Query Federation
SYBASE IQ 15.4 DECONSTRUCTED
• Ideal for combining results of Hadoop MR job results with Sybase IQ data for operational (transient) business reports
• Example – In Utilities: Smart meter and smart grid data can be combined for load monitoring and demand forecast. Smart grid transmission quality data (multi-attribute time series data) stored in HDFS can be computed via Hadoop MR jobs triggered from Sybase IQ and combined with Smart meter data stored in Sybase IQ to analyze demand and workload.
Big Data Use CasesCharacteristics• Trigger and fetch Hadoop MR job results via table UDF
– Can trigger Hadoop MR jobs
– Called as part of Sybase IQ SQL query
– Output joinable with Sybase IQ data
• HDFS data not stored in Sybase IQ
– Fetched into Sybase IQ In-memory tables
– ACID properties not applicable
• Repeated use: put fetched data in tables
• Visible to BI and other client tools via Sybase IQ ANSI SQL API
CharacteristicsCombine results of
Hadoop MR jobs with Sybase IQ data on the fly: Initiate and
Join results of Hadoop MR jobs on-demand using SQL queries
from Sybase IQ data (Query Federation
technique)
Feature6d
Smart Grid Transmission Data
Smart Meter Consumption Data
UDF Bridge Sybase IQHDFS
Tuesday, May 22, 12
© 2012 SAP AG. All rights reserved. 23
Data Discovery (Data Scien7sts)
Applica6on Modeling (Business Analysts)
Reports/Dashboards (BI Programmers)
Business Decisions (Business End Users)
Infrastructure Management
(DBAs)
• Dynamic, elastic PlexQ™ MPP grid–Grow, shrink, provision on-demand–Heavy parallelization
• Load, prepare, mine, report in a workflow–Privacy through isolation of resources–Collaboration through sharing of results/data via sharing of resources
Unique, user community focused platform for big data analyticsSYBASE IQ 15.4
Full Mesh High Speed Interconnect
SAN Fabric
Tuesday, May 22, 12
Thank you
Courtney ClaussenProduct Manager, Sybase [email protected]
David JonkerProduct Marketing Director, Sybase [email protected]
Tuesday, May 22, 12
Most of the Big Data opportunity is, in the end, a Big Analytics opportunity.
There are two challenges in this:
Managing the data and the data flow
Providing acceptable performance for analytics applications
Hadoop and its associated technologies can be both a blessing and a curse.
Twitter Tag: #briefr
Tuesday, May 22, 12
• Hadoop = Key-value store & Parallel processing framework• Some NoSQL databases are DHT-based, some are specialized DBMS• Column-store DBMS vary, but in general they are MPP RDBMS and NewSQL DBMS
Twitter Tag: #briefr
Tuesday, May 22, 12
Twitter Tag: #briefr
Data volumes (includes complexity of data structure)
Concurrency (includes also workload variability)
Computation (is application dependent)
Data flow architecture is a factor
Tuesday, May 22, 12
Twitter Tag: #briefr
In many ways this is similar to the Data Warehouse data flow challenge; writ larger
Latency is about application service levels
This is probably still a three stage process
This is, by the way, a simplification
Tuesday, May 22, 12
Big Analytics is here to stay
In some analytical application areas speed is desirable, in others speed is critical.
Warning: Workloads can be mixed
Analytic speed depends upon the database engine, but also data flow architecture
Business effectiveness depends upon integration with the business process
Twitter Tag: #briefr
Tuesday, May 22, 12
Twitter Tag: #briefr
The prebuilt functions clearly make sense (for speed of processing). Are they intended to make some analytic tools unnecessary or simply to be called directly by such tools?
What does SAP see as the appropriate role(s) for Hadoop in most businesses?
As I understand it, Sybase IQ can fully replace Hadoop in some contexts. What are the situations where you think Hadoop AND Sybase IQ is appropriate?
I’m intrigued by the idea of JOINing data between Hadoop results and Sybase IQ, but I’m not sure of the role of such a capability. How is this different from using MR for data ingest?
As you can link up to Hadoop/Sybase IQ at the front or at the back-end, which would you tend to use when?
Tuesday, May 22, 12
Twitter Tag: #briefr
You speak of broad and comprehensive capability, in combination with Hadoop.
So which areas do you think are sweet spots?
And which kinds of application and/or data collections do you think require different approaches?
Who have been the early adopters of this Hadoop/Sybase IQ capability and what kind of business problems are they trying to solve?
What do you see as SAP HANA’s role in this? Are the same analytical capabilities being added to SAP HANA?
Tuesday, May 22, 12
Twitter Tag: #briefr
May: Analytics
June: Intelligence
July: Governance
August: Analytics
September: Integration
October: Database
Tuesday, May 22, 12