sap forum hana & hadoop - & hadoop sap forum javier ... db2, netezza •hadoop...
TRANSCRIPT
Rumbo 2020
FTS INTERNAL
HANA & Hadoop
SAP FORUM
Javier Fernandez Leon
February 2016 Intel Inside®. Powerful Solution Outside. Powered by Intel
®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
Rumbo 2020
FTS INTERNAL Copyright 2014 FUJITSU LIMITED
HANA & HADOOP Intro
INDICE
• Challenges of distributed Big Data
• What is Apache Hadoop? Features
• Comparison HANA vs Hadoop
• HANA & Apache Spark
• HANA & Hadoop combined. Scenarios
• Uses Cases HANA & Hadoop
• Managed Service Pay per use model for HANA & Hadoop
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
2 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
WE ARE DROWING IN OUR OWN DATA
Challenges of distributed Big Data
Inefficient Data Processing
Real-time drill-down interaction is impossible when data is distributed across thousands of nodes and processed in batches
Lack of Business Alignment
Need to align business decisions to changing external market conditions by processing data in business systems with Hadoop Data Lakes together.
Costly Management of Big Data
Extensive amounts of data start clogging business systems with data that can be more efficiently archived to less expensive systems
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
3 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
WE ARE DROWING IN OUR OWN DATA
Gap between the Enterprise & Big Data Frameworks
Enterprise Core Systems
Complexity
Performance
Unable to work together
….
Big Data Frameworks &
Tools
Objetives : Standarize, simplify and Automate both worlds.
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
4 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
HADOOP
What is Apache Hadoop?
APACHE HADOOP is open source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers
RELIABLE : Software is fault tolerant, it expects and handles HW and SW failures
SCALABLE : designed for massive scale of processors, memory and local attached storage. Petabytes
DISTRIBUTED : Handles replication. Offers massively parallel programming model , MapReduce
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
5 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
HADOOP
Hadoop Logical Components
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
6 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
HADOOP
What does Hadoop bring to the Table?
Cost efficient data storage and processing for large volumes of structured, semi-structured and unstructured data such as web logs, machine data, text data, call data records, audio, video data….
BATCH PROCESSING Where fast response times are less critical than reliability ad scalability
COMPLEX INFORMATION PROCESSING: Enable heavily recursive algorithms, machine learning & queries that cannot be easily expressed in SQL
LOW VALUE DATA ARCHIVE: Data stays available, though access is slower. Scale up to Petabytes
POST-HOC ANALYSIS: Mine raw data that is either schema-less or where schema changes over time
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
7 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
HADOOP
Who uses Hadoop?
FACEBOOK Facebook runs the world’s largest
Hadoop cluster. Just one of several Hadoop clusters operated by the company spans more than 4,000 machines, and houses over 100
petabytes of data
Facebook messaging (Hbase) and generate reports for advertisers who need to track effectiveness of
campaign
TWITTER Twitter uses Hadoop for product analysis, social graph analysis, generating indices for people search, natural language processing and many other applications
YAHOO Yahoo runs Hadoop on 42,000 servers--that's 1,200 racks--in four data centers. Its largest Hadoop Cluster was 4000 nodes. Use it for indexing of web crawl results
Intel Inside®. Powerful Solution Outside. Powered by Intel
®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
8 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
HADOOP & HANA
Comparison Hadoop & HANA
HADOOP SAP HANA
Data Architecture Unstructured data and files on disk Structured data in memory
Data Structures No predefined schema Predefined schema & models
Performance Very slow data access (seconds to hours)
Very fast access (~<1 ms)
Scalability Scale-out to thousands of low cost servers Scale up/ Scale-out to many server
Data Consistency BASE ( Basic availability, soft state, eventual consistency)
ACID ( Atomicity, Consistency, Isolation, Durability)
Licensing costs Free Open Source or commercial distros Many options: cloud, enterprise…
OLTP No OLTP Excellent OLTP
OLAP Slow OLAP Excellent OLAP
Server Fail Over Query & Server Fail Over Server Failover
Enterprise Admin Tools Small Excellent
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
9 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
HADOOP & HANA
Combination of HANA & Hadoop
SAP HANA = Instant results
HADOOP = Infinite storage + Raw Data
SAP & Hadoop = Instant access + Infinite scale
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
10 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
SMART DATA ACCESS ( SDA)
Connection to HANA
Benefits
• Enables access to remote data access just like “local” table
• Smart query processing including query decomposition with predicate push-down, functional compensation
• Supports data location agnostic development
• No special syntax to access heterogeneous data sources
• Not restricted only to Hadoop
Heterogeneous data sources
• Oracle, MS SQL, Teradata, DB2, Netezza
• Hadoop –Hive, vUDF, Spark
• SAP HANA (BWoH, SoH)
• SAP Sybase ASE, IQ, MaxDB
• SAP Sybase ESP, SQLA
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
11 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
SCENARIO HADOOP - HANA
Example of scenario for bringing both worlds - POS
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
12 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
APACHE SPARK
Spark
• VERY fast in-memory, data-processing framework – like lightning fast. 100x faster than Hadoop fast
• Unlike Hadoop, supports batch and steaming Analysis --> Single Framework for batch and near real time use cases
• Spark requires a 1)Cluster Management :standalone, Hadoop YARN, Apache . 2) Distributed Storage System : supports HDFS, Cassandra, Openstack Swift, Amazon S3 - • All Hadoop connectors can be leveraged in Spark
• If you are going to start with Hadoop now, you should do it with Spark
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
13 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
WHAT IS INSIDE?
SAP HANA Vora
HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. • HANA Spark Adapter for improved performance between distributed systems • Compiled queries enable applications & data analysis to work more efficiently across nodes • Familiar OLAP experience on Hadoop to derive Business Insights from Big Data such as drill-down into HFDS data • Integration of SAP data with data Lakes • HANA connectivity on Hadoop • Enterprise Analytics(hierarchies) & Interactive SQL on Hadoop data • Data Tiering from HANA to Hadoop for OLAP scenarios using DLM • Archiving of ERP data using ILM to Hadoop
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
14 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
USE CASE : IoT for a Turbine
SAP HANA Vora
• Sensors stream data continuously
• Sensors typically structured in a Hierarchy
• Information regarding Hierarchy are typically stored on ERP System
• Information important for error detection: two sensors
ROLE OF HANA VORA
• Providing OLAP capabilities - Joining Hierachy with IoT Data
• Bridges gap between Enterprise systems and cluster : BOM of
turbine easily accesible
• Performance of in-memory computing: On both Enterprise & Cluster
processing
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
15 FUJITSU CONFIDENTIAL
© 2015 FUJITSU Copyright 2014 FUJITSU LIMITED Copyright 2014 FUJITSU LIMITED INTERNAL USE ONLY
Key Scenarios
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
16 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
Example of Scenarios
Key Scenarios
• Flexible data store – Using Hadoop as a flexible store of data captured from multiple sources, including SAP and non-SAP software, enterprise software, and externally sourced data
• Simple database – Using Hadoop as a simple database for storing and retrieving data in very large
data sets
• Processing engine – Using the computation engine in Hadoop to execute business logic or some
other process
• Data analytics – Mining data held in Hadoop for business intelligence and analytics
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
17 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS
Key Scenarios - Architecture
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
18 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS
Key Scenarios – Hadoop as Flexible Data Store
SCENARIO DESCRIPTION SAMPLE USE
CASES
COMMENT
Social Media Real-time capture of data from social media sites, especially of unstructured Text
Comments on products on Twitter, Facebook, and Amazon
Combine social media data with other data, for CRM data or product data, in real time to gain insight.
Data Stream Capture
Real-time capture of high volume, rapidly arriving data streams
Smart meters, factory floor machines, real time web logs, sensors in vehicles
Data Archive Capture of archive logs that would otherwise be sent to off-line storage
Archive Data or computer systems logs
Lower costs when compared with conventional solutions
OLTP Transaction Data
Long-term persistence of transactional data from historical online transaction processing (OLTP)
Call center, inventory..
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
19 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS
Key Scenarios – Hadoop as Flexible Data Store
SCENARIO DESCRIPTION SAMPLE USE CASES
Reference Data Copy of existing large reference data sets
Census surveys, GIS, large industry specific data sets, weather measurement and tracking systems
Store reference data alongside other data in one place to make it easier to combine for analytic purposes
E-mail histories Capture logs of e-mail correspondence a company sends and recevives
Fulfillment of legal requirements for e-mail persistence and for use in analytics
Combine data from email with other data to support, for example, risk management
Document & Multmedia Storage
Capture of business documents generated and received by business. BLOBS
Healthcare, insurance and other businesses that generate or use large volumes of documents that must be kept for extended periords
Store unlimited number of documents in Hadoop, for example, using HBAse
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
20 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS
Key Scenarios – Hadoop as Processing Engine
Use Hadoop as a data processing engine for ETL rationalization to feed SAP HANA • MapReduce Programs execute process logic
• Pig for data analysis
• Mahout for data mining and machine learning
• Replicate master data to hadoop for data processing
• Feed results to SAP HANA with Data Services and merge with conformed model
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
21 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS
Key Scenarios – Hadoop as Processing Engine
SCENARIO DESCRIPTION SAMPLE USE CASES COMMENT
ETL Rationalization Low-latency ingestion of data from operational systems
Tiered storage: High-value data loaded and transformed in HANA in parallel, off-load preprocessing to hadoop
Identify differences Differences in large, but similar sets of data DNA Analysis Hadoop using Mapreduce
Risk Analysis Look for known patterns in data in Hadoop that suggest risky behavior
Risk in credit cards; Rogue traders Da
Data Cleansing and enrichment
Fix data issues. Enhance with additional information
Add demographic or other data to, for
example, customer Web logs
Data Mining Look for patterns, data clusters, and correlations in Hadoop
Analyze machine data to predict Correlate customer behaviour
Require Mahout
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
22 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS
Key Scenarios – Hadoop & HANA for Analytics
• Hadoop storage is sometimes so high that can´t be replicated into SAP HANA in a cost effective or timely
manner
• Some of the analysis must be done in Hadoop as well as SAP HANA
• Hadoop queries require longer processing times that SAP HANA
• Analysis will likely require combining data from Hadoop , SAP HANA and other sources
• Two approaches:
• Two-Phase Analytics : run analysis continually o Hadoop, then periodic updates to SAP HANA for
fast interactive query response
• Federated Queries:
• Split analysis into parts and run async on Hadoop & SAP HANA
• Federate results in SAP HANA or BI
Intel Inside®. Powerful Solution Outside. Powered by Intel
®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
23 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS – Two-Phase Analytics
Key Scenarios – Hadoop & HANA for Analytics
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
24 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS – Federated Queries
Key Scenarios – Hadoop & HANA for Analytics
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
25 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
USE CASES
Use Cases - Healthcare
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
26 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE
Use Cases - Healthcare
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
27 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
EXAMPLE OF USE SCENARIOS
Use Cases – Predictive Maintenance
Business Challenges A computer server manufacturer wants to implement effective preventative maintenance by identifying problems as they arise then take prompt action to prevent the problem occurring at other customer sites
Technical Challenges • Identifying problems by analyzing text data from call centers, customer questionnaires together with server logs
generated by their hardware • Combining results with CRM, sales and manufacturing data to predict which servers are ikely to have problems in
the future Solution • Use SAP Data Services to analyze call center data and questionnaires stored in Hadoop and identify potential
problems • Use HANA to merge results from Hadoop with server logs to identify indicators in those logs of potential problems • Combine with CRM, bill of material and production/manufacturing data to identify cases where preventative
maintenance would help
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
28 FUJITSU CONFIDENTIAL
© 2015 FUJITSU Copyright 2014 FUJITSU LIMITED Copyright 2014 FUJITSU LIMITED INTERNAL USE ONLY
Pay per use Models for HANA & Hadoop
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
29 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
Modelo de Servicio definido por 5 parámetros
EJEMPLO: Sistema SAP ERP 6.0 de PRODUCCIÓN Cualitativos Cuantitativos 5
parámetros standard definen el servicio SAP
Estos parámetros reflejan el uso!!!! Estos parámetros reflejan los SLAs!!!!
Availability class
Disaster-recovery class
Additional Certification(s)
99.5%
DR, local HA,….
ISAE3402 (SOX), SAS70…
Managed operations 24 × 7
Dialog response time 90% < 1 sec.
Managed performance
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
30 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
SLAs verificables desde SAP
Las transacciones representanla utilización real del sistema SAP y están vinculadas al negocio
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
31 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
¿Y qué pasa con SAP HANA?
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
32 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
HANA en Cloud en modo pago por uso - vHANA
vHANA CLOUD
PAGO MENSUAL EN FUNCIÓN DE LA
MEMORIA CONSUMIDA EN HANA
SERVICIOS INCLUÍDOS
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
33 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
Hadoop in Pay Per Use based on Openstack
Serv
ice
Gov
ern
ance
(S
ervi
ce D
esk,
Ser
vice
-Man
agem
ent)
Hadoop Integration with SAP HANA (Administration , Connectivity…)
OPENSTACK System Services (Administration/Monitoring, patches, upgrades ...)
OPENSTACK FRAMEWORK (Ceph, Neutron, Nova. Heat….)
Data Center and Network Services (Administration Monitoring , Capacity-Management)
Level 3
Level 2
Level 1
HADOOP PLATFORM Services (Administration/Monitoring, Backup- & Recovery, patches, upgrades …)
Level 4
Level 5
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
34 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
Hadoop in Pay Per Use based on Openstack
HADOOP CLOUD
PAGO MENSUAL SERVICIO GESTONADO EN FUNCIÓN
DE LA MEMORIA/CPU/ CONSUMIDA POR HADOOP
SERVICIOS INCLUÍDOS
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
35 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
Take Aways
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
36 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
TAKE AWAYS
Summary
• Hadoop excels at very high-scale, low-cost/TB and data type flexibility
• SAP HANA excels at speed and structure, plus is fully integrated with Business Suite –Enterprise Logic
• Leverage strenghs of both platforms in data store, data processing and analytics scenarios
• Carefully evaluate your requirements and use case against these scenarios
• If you are about to start with Hadoop, use Apache Spark & Vora
• Both can be deployed in a simple, pay per use model by Fujitsu
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
37 FUJITSU CONFIDENTIAL
© 2015 FUJITSU
Intel Inside®. Powerful Solution Outside. Powered by Intel®
Xeon® processor. More information: www.descubrefujitsu.com/SAPforum
Rumbo 2020
FTS INTERNAL