Download - Sap Hana in Memory Appliance
SAP HANA In Memory Appliance
TABLE OF CONTENTS
1. In Memory Computing
1.1. Move from After-Event Analysis to Real-Time Decision Making
2. SAP In-Memory Appliance (SAP HANA)
2.1. SAP HANA In Memory Computing Engine & Surroundings
2.2. Modelling & Data Loading into SAP HANA
3. SAP High-Performance Analytic Appliance 1.0 Overview
3.1. Technical Overview – Request Processing and Execution Control
3.2. Calc Engine
3.3. SAP HANA Replication Technologies
3.4. Row Store
3.4.1. Row store Architecture/Block diagram
3.4.2. Row Store Architecture Operations flow
3.4.3. Indexes for Row Store tables
3.5. Column Store
3.5.1. Column Store Operations flow
3.5.2. Delta management in Column store
3.6. Persistence Layer
3.7. Modeling
3.8. SAP HANA In memory Computing Studio
3.8.1. SLT integration in SAP HANA
3.8.2. Information Modeler Terminology
3.9. HANA Modeling Process Flow
4. SAP HANA Backup and Recovery
5. HANA Proof of Concept – Oil & Gas Industry
1
SAP HANA – High Performance Analytical Appliance
1. In Memory Computing:
Technology that allows the processing of massive quantities of real time data in the main memory of the server to provide
immediate results from analyses and transactions. In memory computing leverages multicore architectures equipped with
large volumes of memory in direct access, to crunch through very large volumes of data virtually in seconds. Special CPU
cache conscious data structures and parallelized algorithms have been designed by SAP to fully leverage the hardware in
order to deliver extreme performance for a new generation of information rich enterprise applications. In memory computing is
delivered via the SAP HANA appliance which incorporates the SAP HANA database.
Advantages:
Loads of real time data along with historical data with high data throughput which helps users to Flexible real time
Analytics, Improved Business Performance and competitive advantage.
Make Better Decisions Faster - This allows new ways to look at the business based on instant /immediate access to
relevant information then react quickly based on real time information which leads less reliance on IT to gain insight
needed.
Enable Innovative New Applications - Combine high volume transactions with analytics for improved BI, this
Accelerate transactional and operational systems for real access and improved decision making which Enable
planning and forecasting applications based on real time operational data combined with analytics.
Reduce IT Burden and Mitigate risks - SAP HANA Applications dramatically reduces hardware and maintenance
costs, In Memory solutions based on proven mature technology which are easy to implement, on disruptive and fast
to implement.
In memory can also help to improve the profitability analysis (CO/PA) by significantly reducing report run times.
In memory computing can also be used for agile data marts, enabling business departments to gain a higher level of
flexibility when creating add-hoc data marts for specific business problems.
Note: Dramatically improved hardware economics and technology innovations in software has now made it possible for SAP to
deliver on its vision of the Real-Time Enterprise with In-Memory business applications
2
1.1. Move from After-Event Analysis to Real-Time Decision Making – with SAP ICE (In Memory Computing Engine)
The persistently increasing quantity of data from enterprise applications as well as from the web is a great opportunity but also
a challenge at the same time. Comprehensive data from different sources, such as operational systems, data warehouses and
the web, enable intensive analysis, but it is difficult to manage and often causes unacceptable response times. And time is
money! In fact, slow access even prevents businesses from analyzing data, which would allow them to make informed
decisions. Some interesting queries would not just take hours, but even days. And once the result is there, it is too late for
immediate reaction, as the background data has already changed.
SAP ICE helps to overcome such hurdles, as huge amounts of real-time data can be processed in the main memory of a
server, thus dramatically accelerating data access for analysis. From the business point of view this enables faster decisions
based on in-depth data analysis.
In our case SAP HANA, based on innovative in-memory technology, does not simply accelerate data access, but provides a
quantum leap in data analyses by giving you access to transactional data at your fingertips. And SAP HANA completely
changes the way in which data can be used.
2. SAP In-Memory Appliance (SAP HANA)
SAP HANA is a modern platform for real-time analytics and applications. It enables organization to analyze business
operations based on large volume and variety of detailed data in real-time, as it happens. SAP in-memory computing is the
core technology underlying SAP HANA platform.
SAP HANA is a combination of hardware and software specifically made to process massive real time data using In-Memory
computing. It is an in-memory engine with database, data integration and aggregation capabilities for analyzing operational
and transactional databases.
SAP HANA is a flexible, data-source-agnostic appliance that allows customers to analyze large Volumes of SAP ERP data in
real-time, avoiding the need to materialize transformations.
SAP HANA integrates a number of SAP components including the SAP In-Memory Database, Sybase Replication technology
and SAP LT (Landscape Transformation) Replicator.
3
SAP In-Memory Database - The SAP In-Memory Database is a hybrid in-memory database that combines row-based, column-
based, and object-based database technology. It is optimized to exploit parallel processing capabilities of modern multi
core/CPU architectures. With this architecture, SAP applications can benefit from current hardware technologies.
The SAP In-Memory Database is at the heart of SAP offerings like SAP HANA that help customers to improve their operational
efficiency, agility, and flexibility.
The appliance is designed to facilitate the integration into existing compute centers. It uses standard communication protocols
such as ODBC and JDBC to communicate with other systems.
In addition to real-time analytics, SAP is also delivering new class of real-time applications, powered by SAP HANA platform.
The platform can be deployed as an appliance or delivered via the cloud.
Suppose assume that HANA is an engine of a car and BW is body of the Car, So HANA would be inside the OLAP BW
System for business operations/needs/analysis. For now HANA is in ramp-up phase so it has to be tested with all sorts of Data
bases to bring SAP HANA to SAP customers from Proof of Concept (POC) to Go-Live. Because of these reasons still we have
no idea of the body of the car.
4
2.1. SAP HANA In Memory Computing Engine & Surroundings:
SAP HANA is a preconfigured out of the box Appliance
In memory computing engine
In memory computing studio as a frontend for modelling and administration.
HANA is connected to ERP systems, Frontend modelling studio can be used for load control and replication server
management
Two types of Relational Data stores in HANA : Row Store, Column Store
SAP BOBJ tools can directly report HANA
Data from HANA can also be used in MS Excel
Row Store – Traditional Relational Database, the difference is that all the rows are in memory in HANA where as they
are stored in a hard drive in traditional databases.
Column Store – The data is stored in columns like in SAP BWA
Persistency Layer: In memory is great by it is volatile and data can be lost with power outage or hardware failures. To
avoid this HANA has a Persistency Layer component which makes sure that all the data in memory is also store in a
hard drive which is not volatile
Session Management: This component takes care of logon services
Two processing engines – Well, data is in memory which is good but How do I extract/report on the data? HANA has
two processing engines one is based on SQL which accepts SQL quires and the other one is based on MDX .
HANA Supports Sybase Replication Server – Sybase Replication Server can be used for real-time synchronization of
data between ERP and HANA
5
2.2. Modelling & Data Loading into SAP HANA:
Modelling in HANA can be done in following ways.
Specify which tables are stored in HANA, first part is to get the Meta data and then schedule data replication jobs &
load the data into HANA using Replication Server.
Manage Data Services to Model & load the data from SAP BW and other 3rd party systems.
Manage connections to ERP instances, current release does not support connecting to several ERP instances
Do modelling in HANA- in memory studio itself (This is independent of Data services).
You can also do modelling can also be done in Business Objects Universes which is nothing but joining fact and
dimensional tables.
Reporting:
Client tools can access HANA directly; Like MS EXCEL, SAP BI 4.0 Reporting tools, Dashboard Design Tool
(Xcelsius) etc can also access HANA directly.
Third party reporting tools can leverage ODBC, JDBC and ODBO (for MDX requests) drivers in HANA for reporting.
HANA supports BICS interface
Note: Administration Part can be managed from SAP HANA studio for memory issues.
3. SAP High-Performance Analytic Appliance 1.0 Overview.
SAP HANA in memory Appliance:
An appliance for processing high volumes of transactional data in real time
Includes tools for data modeling, data and lifecycle management, security, operations
Provides support for multiple interfaces based on industry standards
6
Features:
In-Memory software bundled with hardware delivered from the hardware partner (HP, IBM, CISCO and Fujitsu).
In-Memory Computing Engine.
Tools for data modeling, data and life cycle management, security, operations, etc.
Real-time Data replication via Sybase Replication Server.
Support for multiple interfaces.
Content Packages (Extractors and Data Models) introduced over time.
Analyze information in real-time at unprecedented speeds on large volumes of non-aggregated data.
Create flexible analytic models based on real-time and historic business data.
Foundation for new category of applications (e.g., planning, simulation) to significantly outperform current applications
in category.
3.1. Technical Overview – Request Processing and Execution Control
In SAP HANA every query execution would follow few step depends upon the query type, In memory computing uses
Calculation models which gives extreme performance and flexibility with calculations on the fly.
7
A calc model can be generated on the fly based on input script ,also defines parameterized calculation schema for highly
optimized reusable queries Because Calculation model supports all types of scripted operations.
Once SQL, MDX statements are passed to calculation models Optimizer which is included in calculation engine optimizes the
input statements for better performance.
From above architectural diagram, it consists of multiple interfaces (SQL Script, MDX and planning engine interface) for
multiple query types. All these Domain-specific programming languages or models converted into calculation models.
But a standard SQL processed directly by DB engine.
After a Calculation model has been defined Calculation Engine will create a logical execution plan for calculation models,
means defines the priorities for the steps in operations and execute user defined functions.
Relational Engine is an in memory DB property which is mainly needful for physical execution plan,
DB optimizer which is a part of Relational Engine will produce physical executing plan. Here performance issues, turnaround
time will be considered for query execution.
3.2. Calc Engine:
Query execution flow will be defined here, the operations in the query will be executed depends the priorities of the instructions
in the query. No matter the priorities of the query instructions System will use maximum resources to achieve max through put.
The easiest way to think of Calculation Models is to see them as dataflow graphs, where the modeler can define data sources
as inputs and different operations (join, aggregation, projection…) on top of them for data manipulations.
The Calculation Engine will break up a model, for example some SQL Script, into operations that can be processed in parallel
(rule based model optimizer). Then these operations will be passed to the database optimizer which will determine the best
plan for accessing row or column stores (algebraic transformations and cost based optimizations based on database
statistics).
8
Note: Planning Engine Will be included in next release. Will include planning functions like distribute and copy functions.
Example SQL Function Execution:
CREATE FUNCTION FUNC1 ( IN p1 INT, IN T1 ttype1,In T2 ttype2, OUT outtab TYPE2)
BEGIN
V1 = SELECT C,D FROM @T1@ WHERE D > @p1@; // QUERY 1
V2 = SELECT A,B FROM @T2@ WHERE B < 1000; // QUERY 2
CALLS FUNC2(@v2@,v3);
V4 = SELECT c,f FROM @V1@, @V3@ WHERE b > 0; //QUERY 3
CALLS FUNC3(@V4@, outtab);
END
3.3. SAP HANA Replication Technologies:
For Analysing & Reporting on top of SAP HANA data has to be replicated from Source System to SAP In memory database.In
SAP HANA supports 3 types of replication methods.
9
Trigger-Based Replication- can be done using Standard SAP Netweaver LandscapeTransformation Replicator based on
capturing database changes at a high level of abstraction in the source ERP system.Here once the data loading is started the
changes to source system will be captured parallally with the replication process.
ETL-Based Replication- it uses SAP Business Objects Data Services to specify and load the relavant business data into SAP
HANA.3rd Pary data providers can be integrated using this method.
Log-Based Replication- It uses Sybase Replication method based on capturing table changes from low level database log
files. Database changes are propagated on a per database transaction basis, which are then replayed on the IMDB to
maintain consistency.
3.4. Row Store:
Row store is one of the relational engines to store data in row format. It is interfaced from calculation/ execution layer in HANA
Architecture. It is a pure in memory store where data persistence is managed in persistence layer.
Note: Page Management is executed in Persistence layer, so mapping between indexes and data volumes are done.
3.4.1. Row store Architecture/Block diagram:
In Row Store Block diagram we have 5 key components.
Transactional version memory is the heart for row store; it contains temporary data versions which are useful for database
operations like Write, Insert, Read, etc. All write operations mainly goes into Transaction version memory. All write operations
are INSERTed into Persistent Segment, This moves all the visible version from memory to persistence segment permanently
then these outdated entries will be cleared from Transaction Version memory. Transactional Version memory is needed for
Multi-Version Concurrency Control (MVCC).
Note: MVCC is a concurrency control method commonly used by database management systems to provide concurrent
access to the database and in programming languages to implement transactional memory.
Segments are physically storage area that contains the actual data (contents of row store tables) in pages. Note: Pages are
fixed length storage locations
Page Manager is a process which manages memory allocation for Pages in Segment Area. It keeps tracks of used / free
pages for row store table data.row store tables are liked list of memory pages and these pages are grouped into segments.
Version Memory Consolidation works like a Garbage collector for MVCC.
10
Persistence Layer will be invoked when WRITE operation are done by Transactional Version memory. This layer allows us to
perform save points for Database Operations.
3.4.2. Row Store Architecture Operations flow:
Write Operations mainly goes to Transactional Version Memory and also INSERT writes to Persisted Segment.
Persisted Segment contains data that may be seen by any ongoing transactions and holds the data that has been committed
before any active transaction started.
Version Memory Consolidation moves “visible version” from Transactional Version memory into Persisted Segment based on
commit ID.It clears the outdated record version from Transactional Version Memory.
3.4.3. Indexes for Row Store tables:
Each row store table has a primary index which points to ROWID of row store table.ROWID consists of segment & Page ID for
respective records. Using the Segment & page ID records will be searched in persisted segment layer. These indexes will be
created on-the-fly when system loads tables into memory when system startup so these indexes are volatile. All these index
table definitions are stored in row store table metadata. Secondary indexes can be created if needed.
11
3.5. Column Store:
Column store is one of the relational engines. It is interfaced from calculation/ execution layer in HANA Architecture. It is a
pure in memory store where data persistence is managed in persistence layer.
Column store engine significantly improves the read functionality and write functionality aswell.Data in column store is highly
compressed. Column store doesn’t contain real data file virtually access to real files. Column store consists of Optimizer &
Executor which handles Queries and execution plan.
In column store engine mainly we have two components Main Store & Delta Store.
Main Store is highly compressed & read optimized so data will be read from Main Store itself.
Delta Store is mainly used for fast Write operations. The data between these two layers will be merged asynchronously. This
Asynchronous merge will move the data from Delta store to Main store.
3.5.1. Column Store Operations flow:
As we know Column store has two storage areas (Main & Delta) enables high compression and high write performance at the
same time.
Write operations are done in Delta store, this update is performed by inserting a new entry into the delta storage.
Compression in Main store is done by creating dictionary and applying further compression methods. This speeds up data
load into CPU cache & search operations. This compression is performed during delta merge operation.
Read operations are done from both main & delta store then merge the results, this engine uses multi version concurrency
control (MVCC) to ensure the consistent read operations.
3.5.2. Delta management in Column store:
Delta merge operation move the changes ( new/change data ) in delta storage into the compressed and read optimized main
storage.This operations is done asychronously.
12
Even during the merge operation, the columnar table will still be available for read and write operations. To fulfil this, a second
delta and main storage are used internally.
Note : This merge operation can also be triggered manually with an SQL statement.
3.6. Persistence Layer:
Persistence layer is needed because main memory is volatile.Persistence layer provides backups and restore functionality
during database restart , power outage etc, So data will be stored in non-volatile way.
One persistence layer takes care about row store and column store in in memory computing engine.Persistence layer provides
Regular “savepoints” that provides full persisted image of DB at the time of savepoint, Logs capturing all DB transactions since
last save point (redo logs and undo logs written) restore DB from latest save point onwards and ability to create "snapshots"
used for backups.
System Restart and Population of In-memory Stores:
During the system restart Last save point must be restored plus undo logs must be read and uncommitted transactions saved
with last save point and apply redo logs and complete content of row store is loaded into memory during start process.
Flags are set for column store to specify which tables are loaded during system restart.Only tables set flag are loaded into
memory during startup, if table is set flag for loading on demand the restore procedure is invoked on first access.
From the above diagram At the time of system crash the transactions (Transaction T1) which are commited but not all records
were stored into system will require redo operation, and the transactions not committed ( Transaction T2) will require undo
operations means no record will be added to system out of Transaction T2,so whole Transaction T2 has to be commited
again.
3.7. Modeling:
Basically we have two relational data stores in HANA, Row store and Column store, But out of two Modeling is possible for
column tables only. Information Modeler which is key component in HANA studio for modeling is compatible with column
tables because two reasons
Replication server creates tables in column store per default
Data services create tables in column store per default
SQL statements can define column table definitions like CREATE COLUMN TABLE, ALTER TABLE etc.
13
System generated table are stored where they best fits, Administrative tables , schema definition tables and statistics server
table will be created in row store .
Few Administrative tables in column store:
Schema _SYS_BI -> metadata of created views + master data for MDX
Schema _SYS_BIC -> some generated tables for MDX
Schema _SYS_REPO ->e.g. lists of active/modified versions of models
3.8. SAP HANA In memory Computing Studio:
This place where we model the databases, Information Modeler/Composer which runs on java based eclipse tool.
Information modeler/Composer has some pre-defined feature for modeling the database, It can support/allows different
database views and allows to publish or consume at 4 levels of modeling they are Attribute view, analytic view, analytic view
with enhanced Attribute view and calculation view.
These information models are just virtual definitions don’t store physical data, but information modeler allows loading physical
data into it.
Information modeler/Composer allows import/export data source schemas, mass and selective loads. It allows data
provisioning for SAP business Applications which allow loading / replicating the Applications.
3.8.1. SLT integration in SAP HANA:
SAP Landscape Transformation is a procedure to get the SAP Source data into HANA means entire SAP BW data sources
will be replicated into HANA studio for modeling. Upon fetching the data sources into HANA once model it as business needs.
To propose a SAP BI system on HANA, BI should be in at least Net weaver 7.3 version.
Note: For modeling in HANA studio SQL script language needed.
3.8.2. Information Modeler Terminology:
Data in Information Modeler can be represented by Attributes and measures.
Attributes are nothing but descriptive data can be thought of characteristics in SAP BE terminology, Measures are data that
can be quantified and calculated know as key figures in SAP BW terminology.
Models / Views can be represented by Attribute view, Analytic view and calculation view.
Attribute View – Attributes are modeled using Attribute view. Attribute view can be regarded as a master data table which
further linked to fact table in Analytic view. In attribute view measure can be defined as attribute for modeling. In Simple terms
Attribute view can be treated as Dimension in SAP BW terminology. Attribute view supports left Outer, right Outer, full Outer
and text table joins and it supports all cardinality conditions except N: N.
Analytic View – This can be regarded as Cube where face table Transactional data is connected to attribute view. But here
Analytical view doesn’t contain data rather data will be stored in column store or table view based on analytic view structure.
Attribute’s and measure’s properties can be modified using property tab.
14
All the views are organized in different folders under information modeler Packages.
There are three main views one can select from when previewing data.
Raw Data – table format of data
Distinct Values – graphical and text format identifying unique values
Analysis – select fields (attributes and measures) to display in graphical format
Calculation View:
Here we can define view / models with custom functions and calculations.SQL script will helpful in defining the calculation
view. But SQL script can’t change data any data unlike SQL procedures they are read only.
Hierarchies: Information modeler supports leveled hierarchy structure for having multiple attribute structure and parent child
hierarchy.
3.9. HANA Modeling Process Flow:
Import Source System metadata - Here Physical tables structure are created dynamically (1:1 schema definition of source
system tables). SLT will be done in case of SAP BW environment.
Provision Data – Once metadata has been replicated the tables are loaded with content/data.
Create Information Models – Upon loading the physical data into System modeling will be performed, Information models will
be defined based on business requirement.
Deploy – Once modeling is done Column views are created and activated based on Information model structures. These are
the views which allow fast accessing of business data for Reporting and these view are included with indexes for fast
accessing.
Consume – After Column views are activated they were ready for reporting, depends on the choice of the client tools BICS,
SQL, and MDX.
4. SAP HANA Backup and Recovery:
The SAP HANA database holds the bulk of its data in memory for maximum performance, but still uses persistent storage to
provide a fallback in case of failure.
During normal operation of the database, data is automatically saved from memory to disk at regular save points. Additionally,
all data changes are captured in the log. Data and log are automatically saved to disk at regular save points, the log is also
saved to disk after each COMMIT of a database transaction
After a power failure, the database can be restarted like any disk-based database and returns to its last consistent state by
replaying the log since the last save point.
Following actions will be done while restarting the system after power failure.
15
Last save point is reloaded
Uncommitted transactions are rolled back using the undo information contained in the Save point, committed
transactions are rolled forward using the log
Data is loaded back into memory:
Tables will be loaded slowly (lazy reloading) to keep the restart time short.
Complete content of the row store is loaded, column store tables are loaded if marked for preloading If a table
has been marked for loading on demand, it is reloaded when they are first accessed
Note: While save points and log writing protect your data against power failures, this does not help when the persistent storage
(disk) itself is damaged.
5. HANA Proof of Concept – Oil & Gas Industry
Here the results of the POC were outstanding with HANA technology resulting in lightening fast computation of time and
resource heavy Strategic reports, while providing un-compromising preciseness on the Operational reporting needs.
Eventually summarizes how quicker business decisions and “Speed to Business” are possible using HANA at any level of an
organization.
Oil downstream business is characterized by sales and distribution of fuels, lubricants, bitumen and services to several
customers across different lines of business such as aviation, marine and commercial and retail customers. Considering
Aviation alone, the business volumes looks like depicted below. We are talking about business spanning across 90 countries
about 1200 airports and fuelling an airplane on average every 15 seconds and recording about several millions of sales orders
annually.
Consider a case where as a management team of an Oil company, you like to settle disputes with an airport operator and offer
him same day dispute resolution, which can add a lot of brand value to the company.
Here the aim is to build a reports on HANA which will read enormous volumes of data and to see how Flexibility, Preciseness,
Speed and Granular where needed can be demonstrated.
Dispute Resolution Report:
Sales orders analyzed for disputes against the received POS data.Volumes in errors are detected quickly and disputes
resolution initiated.
Operational report
Note: HANA offers parallel thread based data loading for which incredible speeds have been seen.
Report Execution:
Uploaded records with dispute: 33000
Upload speed: 30 seconds (on 4 parallel threads)
Query Speed: 0.5 seconds
16
References:
http://www.sap.com/platform/in-memory-computing
http://www.sdn.sap.com/irj/sdn/in-memory
http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/21575
http://www.sap.com/hana/index.epx
https://www.experiencesaphana.com/community/learn/content
http://www.jonerp.com/ for podcasts/discussions.
17