brian wrona june 1, 2012 · random realtime read/write hadoop common mapreduce distributes &...

23
SAP BusinessObjects & Hadoop Brian Wrona June 1, 2012

Upload: others

Post on 24-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

SAP BusinessObjects & Hadoop

Brian WronaJune 1, 2012

Page 2: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2011 SAP AG. All rights reserved. 2Confidential

What is Hadoop?

Hadoop is open source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers

Reliable

Software is fault tolerant, it expects and handles hardware and software failures

Scalable

Designed for massive scale of processors, memory, and local attached storage

Distributed

Handles replication. Offers massively parallel programming model, MapReduce

Hadoop framework handles: partitioning, scheduling, dispatch, execution, communication, failure handling, monitoring, reporting and more

Page 3: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2011 SAP AG. All rights reserved. 3Confidential

Hadoop Technology FamilyLogical View*

HiveData warehouse that provides

SQL interface. Data structure is projected ad hoc onto

unstructured underlying data

PigPlatform for manipulating and analyzing large data sets. Scripting language

for analysts

HBaseColumn oriented, schema-less, distributed database modeled

after Google’s BigTable. Random realtime read/write

Hadoop Common

MapReduceDistributes & monitors tasks,

restarts failed work

HDFSDistributes & replicates data

across machines

MapReduce• Parallel programming

• Large block data handling (e.g. 64MB)

Non-Relational DBFine-grained data handling

Scripting

* For simplicity, mapping to servers is omitted.

MahoutMachine learning libraries

for recommendations, clustering, classfication

and itemsets

Machine Learning

Page 4: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 4

BI4 FP3: A solution leveraging the existing BI 4.0 architecture

Common user experience for all front-ends

Web Intelligence Crystal Reports Dashboards Explorer

Empower all people, enable all workflows

All data sources

SAP BWHADOOP HIVE

Any RelationalDatabase Web

ServiceFilesSAP HANASybase

The new information The new information design tool is your point of

entry to business intelligence solutions

Best access method for each specific data source

Direct AccessUniverse Access

Highperformance, feature rich and secured

access

Empower business users with the autonomy they need to access, analyze, enrich, and share information freely and securely using

familiar business terms

Page 5: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 5

SAP BusinessObjects Front-end tools

Client tools that support the UNX Universe on Hadoop in 4.0 FP3

Dashboards

Crystal Reports

Explorer

Web Intelligence

Page 6: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 6

Providing Richer InsightSAP BusinessObjects Explorer

Provide all users with a simple and intuitive experience for immediate, interactive accessto information to answer common business questions on the fly

Casual users create their own compositions of multiple Explorer visualizationsExploration views support iOSGeographic awareness – new semantic type for geographic dimensionsTime aware – new semantic type for time dimensionsNatural visualization and navigation for time and geo dimensionsImproved search – auto-correction and ‘did you mean’ feature

Key New Capabilities

Page 7: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 7

Explorer FP3 on Hadoop Hive

Page 8: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 8

Information Design Tool on Apache Hadoop Hive

We connect to Hive using a relational connection and a JDBC driver

The connectivity for Amazon EMR is planned for the 4.1 release.

Page 9: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 9

Semantic LayerHadoop support

New Universe format (UNX) via a JDBC relational connection to Hive

Access via tables in Data Foundation

Can support Hive Partitioned tables.

The connectivity for Amazon EMR is planned for the 4.1 release.

Page 10: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 10

Information Design Tool on Hadoop Hive

A Data Foundation against a Hive schemaOne can draw joins between the Hive tablesWe support Hive tables, aliases, derived tables, Hive views and Hive partitioned tables

Support for multi-source is planned for 4.1.

Page 11: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 11

Querying Hive data

The business user can get his data out of Hadoop in a non-technical manner using the query panel.

Under the covers we generate a HiveQL statement that is then translated into map reduce tasks by Hadoop Hive.

Sub-query and Ranking features are not supported.

Sampling is supported.

Page 12: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 12

Text Analysis

We have 3 speeches in natural language accessible from an Hive table

Page 13: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 13

Text Analysis

Finding recurrent wordsWords extraction and counting is done by Hadoop HiveWebI only presents the aggregated counts data in a chart

The most occuring wordsfound in the speech.

Page 14: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 14

Text Analysis

Finding recurrent word-combinations

Group size is 3 Group size is 4

Page 15: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 15

Statistical Analysis

We have numerical data like Salary or Age accessible from an Hive table

Page 16: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 16

Statistical Analysis

Histogram on salary dataBins definition and counting are done by Hadoop HiveWebI only presents the aggregated counts data in a chart

The salary data distribution.

Page 17: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 17

Statistical Analysis

Summarizing a data setAll the computations required for the descriptive statistics are pushed to HadoopWebI is used as the presentation layer

Page 18: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 18

Time series

Climate data over timeUsing the HiveQL functions we derive dimension objects from a timestampWe also request Hive to perform ad-hoc aggregations

Page 19: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 19

Hana + Hadoop + Data Services + BusinessObjects + SAR

POC ObjectiveProvide insight into how a real-life retail problem is solved by SAP HANA-HadoopAnalyze 10 years of Transactional data = 13 Billion recordsShowcase how we can leverage Hot Data(2 years) in SAP HANA with Warm/Cold Data(10 Years) in HadoopDemonstrate at SAPPHIRE an efficient SAP HANA-Hadoop Cloud solution leveraging SI for data migration and integration services

Solutions:SAP HANA™ Apache Hadoop™ SAP BusinessObjects Data Services 4.1SAP BusinessObjects 4.0 fp3RDS-Sales Analysis for Retail on Hana

Benefits:Affordable solution for Data StorageReduce capital expenditures with cloud

management of data storage and SAP Analytics

HanaLinux

Web Intelligence Explorer

Virtual Private Servers on the Cloud

Data Services 4.1

Hadoop Universe SAR Universe

BusinessObjects 4.0 fp3

HOTCOLD/WARM

JDB

C

Hive (via SQL)

Page 20: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 20

SAP Database StrategySAP’s current innovative data assets

SAP Real-time Data Platform

SAP HANA

SAP Sybase SQL Anywhere

#1 Mobileand Embedded

Database

#1 Transactional Database with

Best TCO

SAP Sybase ASE SAP Sybase IQ

#1 Analytics Database with

Best TCO

Sybase ESP,Replication Server,

PowerDesigner, + SAP EIM

#1 Unified EIM platform for

Real-Time

Open for Partners

Page 21: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 21

Next generation SAP Real-time Data Platform

3rd Party BI Client

SAP NetWeaver (On Premise / Cloud)

Custom Apps

SAP Business

Suite

SAP Business

WarehouseSAP Big Data Applications

SAP Analytics

SAP Mobile

Open Developer APIs and Protocols

Com

mon

Lan

dsca

peM

anag

emen

t

SAP Smart Data Services Platform

SAP HANA Platform

SAP Real-time Data Platform

SAP Sybase ASE

Com

mon

Mod

elin

gSy

base

Pow

erD

esig

ner

HAD

OO

P 3rd

Part

y D

B

MPP

Sc

ale-

Out SAP Sybase SQLA

SAP Sybase ESP

SAP Sybase IQ

SAP Sybase Replication Server

SAP Data Services SAP MDG, MDM

SAP innovation without customer disruption

Page 22: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 22

This im

SAP HANA Vision/roadmap

Integrate Optimize Synthesize

Next-generation PaaSoffering based on in-memory architecture

Best application and database experience through in-memory optimization

Higher performance through in-memory database

Customer ValueLeverage the power of SAP HANA OnDemandSeamless structure for OnPremise and OnDemand integrationInnovative co-development across the community

Customer ValueDeep optimization: SAP applications, SAP HANA, SybaseSAP HANA becomes the single application platform for OLAP and OLTP for all applicationsHigher business value at reduced cost of operation

Customer ValueState of the art In-memory database platformCrafted for BW, ERP, B1, ByDesignand all other SAP applicationsAccelerate business by removing most common processing bottlenecks between layers

SAP innovation without customer disruption

Page 23: Brian Wrona June 1, 2012 · Random realtime read/write Hadoop Common MapReduce Distributes & monitors tasks, restarts failed work ... Apps SAP Business Suite SAP Business Warehouse

© 2012 SAP AG. All rights reserved. 23

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM Corporation.

Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.

Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.Oracle is a registered trademark of Oracle Corporation.

UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.

Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.

Java is a registered trademark of Sun Microsystems, Inc.

JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries.

© 2012 SAP AG. All rights reserved

Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company.

Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase, Inc. Sybase is an SAP company.

All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior written permission of SAP AG.

This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This document contains only intended strategies, developments, and functionalities of the SAP® product and is not intended to be binding upon SAP to any particular course of business, product strategy, and/or development. Please note that this document is subject to change and may be changed by SAP at any time without notice.

SAP assumes no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or non-infringement.SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This limitation shall not apply in cases of intent or gross negligence.

The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links contained in these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.