(future) in memory enterprise

Upload: joshscribd13

Post on 07-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 (Future) in Memory Enterprise

    1/11

     

    W H I T E P A P E R

    I n - M e m o r y D a t a b a s e T e c h n o l o g y : A C r i t i c a l S u c c e s sF a c t o r f o r t h e R e a l - T i m e E n t e r p r i s e

    Sponsored by: SAP

    Carl W. Olofson

    November 2012

    I D C O P I N I O N

    No one needs to be told that the pace of business for most enterprises is increasing

    exponentially year after year and that new technology is needed for enterprises to

    keep up with both the speed of business and the flood of data that can lead to new

    business opportunities. To respond to this challenge, an enterprise must do the

    following:

      Adopt a strategy that enables reinvention as a real-time enterprise, capable of

    transacting business at the required pace and of taking maximum advantage of

    new business opportunities as they arise.

      Develop a platform capable of coordinating all the IT assets of the enterprise

    while flexibly adapting to changing business conditions.

      Ensure that the platform is undergirded by an in-memory database (IMDB)

    because only an IMDB is capable of keeping up with the speed and flexibility

    demands of the real-time enterprise.

      Consider adopting the SAP Real-Time Data Platform, which is grounded in SAP'sIMDB technology — SAP HANA.

    I N T H I S W H I T E P A P E R

    This white paper explores the issue of high performance in data management and the

    need for such performance in implementing a real-time enterprise. It describes

    the needs of the real-time enterprise and how they are best met by an in-memory

    database, which is managed by memory-based database management system

    (DBMS) technology. The document contrasts memory-based DBMS technology with

    disk-based DBMS technology, showing how the memory-based approach delivers

    much greater performance and simpler management than the disk-based approach

    without sacrificing database recoverability. It also shows how an in-memory database,

    managed by a memory-based DBMS, fits within the framework of the real-time

    enterprise.

    This paper also discusses the architecture of a data and decision management platform

    that includes in-memory database technology. Further, it suggests SAP HANA as

    an example of such in-memory database technology and the possible role of SAP's

    Real-Time Data Platform in the real-time enterprise.

       G   l  o   b  a   l   H  e  a   d

      q  u  a  r   t  e  r  s  :   5   S  p  e  e  n   S   t  r  e  e   t   F  r  a  m   i  n  g   h  a  m ,

       M   A

       0   1   7   0   1   U   S   A

       P .   5

       0   8 .   8

       7   2 .   8

       2   0   0

       F .   5

       0   8 .   9

       3   5 .   4

       0   1   5

      w  w  w .   i

       d  c .  c  o  m 

  • 8/18/2019 (Future) in Memory Enterprise

    2/11

    2 #237635 ©2012 IDC

    S I T U A T I O N O V E R V I E W

    T h e R e a l - T i m e E n t e r p r i s e C h a l l e n g e

    Enterprises are struggling to keep up with the demands of business in a globally,

    electronically connected world. Not only do events that require response come at an ever-

    quickening pace, but business intelligence (BI) about customers and competitors may be

    gleaned from vast amounts of ever-changing data on the Internet, if only it can be rapidly

    acquired and processed. Business managers know that their competitors are chasing this

    intelligence as fast as they are.

    How can a modern enterprise keep up with the pace of business demands and also

    gain an edge from timely business intelligence? The key is in the ability to acquire and

    ingest all the necessary data and to act on that data with as little delay as possible.

    This means dealing with streaming data from external sources and sensor and other

    machine-generated data from internal sources. It means adopting Big Data

    technologies to collect and ingest large amounts of business intelligence data quickly.

     Additionally, it means having a core data management platform that can enable

    applications to put all the pieces together and act in a timely manner. Memory-based

    database technology powering an IMDB needs to be at the heart of such a platform.

    What Is a Real-Time Enterprise?

     A real-time enterprise can act on events as they happen rather than wait for relevant

    information to be entered into the system, stored, compiled, and made available for

    query and reporting. A real-time enterprise needs to handle shifting external factors

    such as customer demand, supplier pricing and product availability, and operational

    costs as well as internal factors such as logistics, inventory, and production rates.

    This is not simply about producing an "executive dashboard." This is aboutmarshalling IT resources based on the current situation on the ground as it changes.

    It is about automated decision generation and action and about supporting just-in-

    time tactical decisions. It affects both transactional operational systems and analytic

    systems, including operational BI.

    Big Data in Motion and at Rest

    Part of what drives the real-time enterprise is Big Data. Most people are familiar with

    Big Data at rest, which involves collecting large amounts of data that may be either

    streaming, machine-generated data or unorganized collections of content; filtering,

    ordering, and formatting that data; and making the data available to drive decisions

    and actions. Hadoop falls into this category, and most Big Data at rest solutions,including those based on Hadoop, are batch oriented or have a batch component to

    them. As a result, there is always a delay of minutes to hours from when the data

    arrives and when it is available to drive actions. For some classes of decisions and

    actions, this is fine; for others, not so much.

    Big Data in motion is different. This also involves large amounts of streaming data,

    but instead of ingesting the data and then examining it, this approach involves

    recognizing events in the data and taking immediate action. Such technology

  • 8/18/2019 (Future) in Memory Enterprise

    3/11

    ©2012 IDC #237635 3

    generally involves the use of a complex event processing (CEP) engine that drives

    actions in response to defined complex events. Of course, recognizing such events

    often requires context, so such systems need to be able to reference facts or patterns

    of facts that have previously occurred, such as those collected by Big Data at rest.

    Thus, the two types of Big Data are complementary.

    Right-Time Decis ion and Act ions

    For real-time decisions to be made and actions to be taken, a system that combines the

    two forms of Big Data must be able to hold at least the most immediately relevant

    elements of the Big Data at rest in a database that can respond immediately to requests.

    For reasons that will soon become apparent, such data should reside in an IMDB.

    Classic operational applications are designed to perform preprogrammed functions in

    a fixed sequence, with a few conditions here and there modifying that sequence

    slightly. A real-time enterprise requires applications that are driven by decisions, most

    of them automated, based on the right criteria. Some of the criteria will call for up-to-

    the-minute data, while others can tolerate various levels of latency.

    In addition to the automated decisions of operational applications, human decisions

    also vary in terms of their complexity, the number of people involved, and the amount

    of data and its required timeliness. Figure 1 illustrates the range of decisions that IT

    systems need to support.

    F I G U R E 1

    I D C ' s D e c i s i o n M a n a g e m e n t F r a m e w o r k

    Source: IDC, 2012

    Operational decisions

    focus on a specific

    project or process and

    result in the formation

    of a type of p olicy or

    rule that drives tacticaldecisions.

    Tactical decisions

    must apply the policy

    or rule in a specific

    case, which lends

    itself to automation.

    Strategic decisions

    set the long-term

    directions for the

    organization, a

    product, a service, or

    an initiative and result

    in guidelines within

    which operational

    decisions are made.

    Degree of Automation

    Strategic

    Decisions

    Operational

    Decisions

    Tactical

    Decisions

    Scope and Degree of Risk

    L  ev el   of   C  ol  l   a b  or  a

     t  i   on

       N  u

      m   b  e  r  o   f   D  e  c   i  s   i  o  n  s

  • 8/18/2019 (Future) in Memory Enterprise

    4/11

    4 #237635 ©2012 IDC

     As may be seen, this framework highlights three decision types:

      Strategic decisions tend to involve the collection and digestion of large amounts

    of data over a considerable period of time, with lots of collaboration among

    interested parties. A data warehouse is usually involved. The smallest number of

    total decisions are strategic.

      Operational decisions  involve choices made within the framework of the larger

    strategy, usually by line managers. They require somewhat less data, but on a

    more timely basis. They may turn to data marts and to some Big Data analysis.

    This type of decision ranks second in terms of the total number of decisions made.

      Tactical decisions are made "on the ground" and in the heat of action. They must

    be made immediately, based on key relevant information that has just arrived,

    compared and processed against known patterns and facts. This involves streaming

    data and a CEP engine to drive the decisions by examining held data that must

    reside in an extremely low-latency database (that is, an IMDB). This represents by

    far the largest number of decisions made.

    Enterprise Information Management

    To nimbly manage all these levels of decision making, an enterprise must have an

    enterprise information management (EIM) strategy that embraces a means of defining

    and coordinating data across a variety of different types of databases and data stores.

    Figure 2 illustrates the kinds of data management involved. Note that in this figure,

    "automated" and "real-time" decisions are both examples of "operational decisions," as

    referenced in Figure 1.

    F I G U R E 2

    D e c i s i o n M a n a g e m e n t i n a n E I M C o n t e x t

    Source: IDC, 2012

    EIMMetadataHub and

    Coordinator 

    Streaming +CEP (BigData inMotion)

    Large VolumeVolatile Data(Big Data at

    Rest)

    Large VolumeStatic Data

    (DataWarehouse +

    CM)

    FixedOperational

    Data(Transactional

    DBMS)

    Real-Time Decisions

    Tactical DecisionsStrategic Decisions

     Automated Decisions

  • 8/18/2019 (Future) in Memory Enterprise

    5/11

    ©2012 IDC #237635 5

     An EIM strategy requires a unified data management platform, undergirded by a

    system of data coordination among the databases in the environment, driven by

    common metadata. It should be noted that for automated and real-time decisions to

    be executed at the right time, contextual data must be available in an instant. This

    requires IMDB technology.

    In-Memory Database Technology

    In-memory database technology involves maintaining current live data in memory

    rather than on disk. The data of record is actually the data in memory all the time.

    Such an approach requires a memory-based DBMS rather than a disk-based DBMS.

    Memory-Based DBMS Versus Disk-Based DBMS

    Most DBMS products in use today are disk-based DBMS products. They are

    optimized for the management of data on disk and minimize disk I/O waits — a major

    design point —  while ensuring consistency based on the committed data in the

    database. A memory-based DBMS, by contrast, is optimized for the organization and

    management of data in memory rather than on disk. This does not mean that there is

    no disk in the picture at all. Spinning disk is often used for the transaction log (to

    ensure recoverability), and for very large databases, seldom-used data may be paged

    to disk to free up memory.

    Disk-Based DBMS

     A disk-based DBMS manages the data in memory for purposes of mapping it to disk.

    This means that all the data must be seen as copies of data on disk and that when

    the data is changed, it needs to be written back to disk at some point. Because data

    is stored on disk in ways that optimize I/O speed, it is not organized in ways that

    make it easy to move from one table row to another in memory. Every time the

    database server looks for a row, it needs to figure out where that row belongs on disk,

    and then whether or not it is already in the buffer. If the row is not in the buffer, the

    database server needs to flush the buffer and load the needed data.

     As a result, most of the instructions the DBMS executes to respond to a data request

    have nothing to do with the request and everything to do with disk and buffer

    management. In fact, even for a simple query, the DBMS executes about 10

    instructions for disk and buffer management for every one instruction that actually

    involves getting the data and returning it to the requester.

    Memory-Based DBMS

     A memory-based DBMS operates quite differently from a disk-based DBMS. Its normal

    operations involve very little I/O, so optimization focuses on keeping the number ofcomputer instructions executed to a minimum. Currently, memory-based DBMS

    products vary widely with respect to the way they organize the database in memory.

    For a memory-based DBMS, each session still has its own buffer, but the database

    itself serves as the standard or common buffer. Such a DBMS requires no reading

    from disk, no mapping to disk, no pages, no flushes. When changed data is

    committed, that data is simply copied to the database in memory, and the operation is

    complete. In some cases, the database may be too big to manage in memory or the

  • 8/18/2019 (Future) in Memory Enterprise

    6/11

    6 #237635 ©2012 IDC

    cost of the total amount of memory required may be unacceptable. In such cases, the

    least volatile data is relegated to disk, and the entire system operates as if all the

    database were in memory, with background services synchronizing the disk-swapped

    data. This way, database operations are not slowed down by the swapping activity. It

    is for this reason that even when a disk-based database is entirely in buffer memory,

    a memory-based database that has all the required data in memory will run an

    average of at least 10 times faster than a disk-based database.

    In-Memory Database

    Some memory-based DBMSs are designed to handle databases that are larger than

    the available memory by swapping the least frequently used data to disk. An IMDB is

    a database that is optimized by a memory-based DBMS so that the entire database

    may be kept in memory. Examples exist of IMDBs that show upwards of 10 times

    performance improvements over disk-based databases containing the same data,

    and executing the exact workloads, even when the disk-based databases are holding

    all their data in buffer so there is no I/O except for the transaction log. Some IMDBs

    show upwards of 200 times performance improvements for both query and update.

    These results vary, of course, based on architecture and workload.

    IMDB Recoverabi l i ty Techniques

    One objection often raised to a memory-based database, and especially an IMDB, is

    that of recoverability. It is assumed that a disk-based database is more recoverable

    because its data resides on disk. This is simply not true. Most modern disk-based

    databases keep the majority of their data in buffers most of the time, writing changes

    to a log. If the system fails, one must bring it up again and use the transaction log to

    write all the changes, in sequence, to disk before the system may be used.

     An IMDB also has a transaction log. In many cases, the database server writes

    changed data to the transaction log and takes periodic background snapshot backups

    (which don't take cycles from the database server itself). Recovery consists of

    reloading the snapshot and rolling forward the logged transactions, which, in an

    in-memory system, takes far less time than when writes to disk are involved.

    This is not the only method of recovery. Many IMDBs are deployed on clusters of

    servers that act as data servers as well as standby servers for each other. Still others

    combine this approach with asynchronous log writes. This means that they don't wait

    for the write to complete successfully before continuing. In the worst case, where the

    entire cluster goes down, the log, together with snapshot backups, can be used to

    recover the database within some designated interval before failure.

    If the interval is zero, then the writes are, effectively, synchronous. Otherwise,

    database operations are timed so that delays occur only if the log gets backed up and

    can't meet the interval requirement. Most databases can tolerate a small interval of

    potentially lost data (the system would need to be completely busy at the time of

    failure for data to be lost); even if very small intervals are allowed, the performance

    boost is huge.

  • 8/18/2019 (Future) in Memory Enterprise

    7/11

    ©2012 IDC #237635 7

    Benefits of Memory-Based Database Technology

    Those who adopt memory-based database technologies can expect to realize the

    following key benefits:

      Improvements of 10 –200 times in terms of throughput even against a fully

    optimized disk-based database system. This means more and faster tacticaldecisions, more and better operational decisions, and far wider analysis

    capability for strategic decisions.

      Ability to incorporate Big Data into decision-making activities —  from real-time

    decisions that exploit immediate sales opportunities, improve service, and avoid

    risk to predictive analysis to reveal future market opportunities.

      Far less disk space is required for the memory-based database, and especially

    for an IMDB, resulting in a cost savings there.

      Although a memory-based solution will require more main memory and

    processors than a corresponding disk-based system, the total memory increase

    is not as great as one might think because a good deal of duplicated memory

    and overhead is eliminated, so this is not a matter of substituting memory for

    disk; the net hardware cost difference should be favorable even when taking

    memory cost into account.

      A great deal of staff time is required to manage a disk-based database. DBAs

    spend a lot of time on activities such as building and rebuilding indexes, mapping

    data to partitions, unloading and reloading data, reallocating data across

    volumes, and so on. All this effort is eliminated, along with the corresponding

    storage management effort.

      A memory-based database is far more nimble than a disk-based database. When

    schema changes or rapid data growth happens to a disk-based system, they

    generally require a reorganization that involves heavy disk volume work.

    Memory-based databases, and especially IMDBs, can generally adjust fairly

    dynamically to both schema changes and data growth, though the latter case

    may require adding servers to the cluster.

    The Right-Time Data Solution

    Memory-based database technology is essential to enabling the application

    environment to keep up with the pace of business. An IMDB is the only kind of

    database that can be used with a streaming data-driven system without slowing

    down. The speed, simplicity, and nimble nature of memory-based databases, and

    IMDBs in particular, make them critical success factors for achieving the real-time

    enterprise, as illustrated in Figure 3.

  • 8/18/2019 (Future) in Memory Enterprise

    8/11

    8 #237635 ©2012 IDC

    F I G U R E 3

    E l e m e n t s o f a R e a l - T i m e E n t e r p r i s e

    Source: IDC, 2012

    Examples of the Real-Time Enterprise in Action

    The following examples of real-time enterprise applications illustrate the principles

    outlined previously. They are not specific use cases, but they incorporate elements of

    a number of use cases with which IDC is familiar.

      Real-time retailing.  A retail firm tracks inventory in its stores by capturing RFID

    information from pallets as they arrive at the loading dock and sales at point-of-

    sales terminals (POSTs). The sales data also reveals volumes and patterns of

    sales moment by moment. Data about sales and inventory by store is loaded into

    an IMDB, and analytic software determines whether changes in sales volumes

    suggest a price change and whether trends compared with inventory suggest the

    need to restock, and if so, which stores and from which warehouses. Such

    operations minimize inventory-related costs and maximize competitiveness

    without ill-considered price changes up or down.

      Real-time trading.  A portfolio management company maintains portfolio

    holdings and rules in an IMDB. Such rules govern how frequently trades may

    occur, how much risk is tolerated, what kinds of issues are to be considered for

    inclusion in the portfolio, etc. The firm also receives streaming data about stock

    trades. It records data about issues of interest in the IMDB and looks for

    interesting trends in share prices. When trends are found, it compares changes

    CEP

    EngineStreaming Data

    Hadoop

    Business Action

    Immediate

    Query

    DataWarehouse

    Enterprise Analytics

    IMDB

    High-Speed Transactions and

     Automated Decisioning

    Real-Time BI

  • 8/18/2019 (Future) in Memory Enterprise

    9/11

    ©2012 IDC #237635 9

    and the algorithmic buy-or-sell suggestion against each portfolio to determine

    whether, for that portfolio, a trade is warranted based on its rules. If so, the trade

    is executed. All this is done in milliseconds.

      Real-time logistics.  A trucking firm receives real-time data about the location

    and condition of each truck on the road as well as traffic conditions, which it

    maintains in an IMDB, along with the contents of all trucks on the road and their

    delivery routes and schedules. Changes in traffic, new or canceled delivery

    orders, exegetical events such as accidents or breakdowns, and the fuel level of

    each truck affect orders that may be pushed out to the drivers that change their

    routes and schedules, all subject to moment-by-moment change. The result is

    more nimble pickup and delivery, faster response to problems, optimal truck

    routing, and fuel cost savings.

    S A P I M D B T e c h n o l o g y

    SAP is taking a multifaceted approach to the challenge of providing memory-based

    DBMS technology to its customers. For SAP application customers, the company

    offers SAP HANA, an IMDB aimed at providing very fast and efficient data operations

    for both analytical and operational workloads. As of this writing, SAP HANA primarily

    supports SAP's analytic applications, but the company plans to deliver transactional

    support for ERP customers by the end of the year.

     At the present time, SAP HANA is not a standalone DBMS; rather, it is used in

    conjunction with a full-featured relational DBMS (RDBMS). SAP recommends use of

    SAP Sybase IQ as the storage database for HANA's analytic workloads and SAP

    Sybase Adaptive Server Enterprise for the ERP workloads. Going forward, the

    company plans to bring these technologies closer together to form the SAP Real-

    Time Data Platform.

    Nonetheless, using SAP applications with SAP HANA positions the user to address

    the extreme performance and flexibility demands of the real-time enterprise.

    F U T U R E O U T L O O K

    The idea that memory-based approaches represent the future of database technology is

    no longer seriously disputed. The only questions have to do with the proper form such

    technology should take, and in most cases, the answer tends to vary depending on the

    data management problem one is trying to solve. The result is that over the next few

    years we will see a number of different memory-based DBMSs enter the market. Some

    will focus on small but very complex analytic workloads. Some will focus on large analytic

    workloads. Others will focus on transactional workloads, especially those that have some

    real-time dimension. Some will come from established DBMS vendors, and others will

    arise from start-ups that no one has ever heard of before.

    The challenge for users is to sift through the different memory-based DBMSs and find

    those that most effectively address the problem in question and that come from a

    vendor that can be trusted to be around for a while — preferably one with a track

    record of success. Users should not expect to standardize on one memory-based

  • 8/18/2019 (Future) in Memory Enterprise

    10/11

    10 #237635 ©2012 IDC

    DBMS, much less one IMDB, for all workloads; rather, they should pick the right tool

    for the job and look to strategic data integration applied for enterprise information

    management to reconcile and coordinate the data.

    C l o u d A r c h i t e c t u r e s

     As memory-based database technologies, and especially IMDBs, evolve, they are

    perfect for both public and private cloud deployments because they eliminate the

    classic problem of disk-based DBMSs: dedicated, hard-to-move or hard-to-change

    storage assets. They fully embrace the cloud concept of resource fungibility that is the

    whole point of cloud virtualization. By contrast, disk-based DBMSs make cloud

    management much more complicated and constrained.

    E v o l v i n g H a r d w a r e A r c h i t e c t u r e s

    We should also bear in mind that just as memory-based DBMS software engineers

    are evolving their technology to take maximum advantage of today's processor and

    memory configurations, hardware engineers are working to evolve better processorand memory configurations in service of IMDBs. This dance of innovation will result in

    a dizzying evolution of this technology for some time to come.

    C H A L L E N G E S / O P P O R T U N I T I E S

     As has been mentioned, memory-based DBMS technologies will emerge from all

    quarters — some old, some new. SAP will need to face the competitive challenges of

    these technologies. This does not mean that SAP needs to eclipse them all. In some

    cases, SAP's IMDB technology will find a synergistic relationship with specialized

    memory-based DBMS technology offered by other vendors. In other cases, especially

    those involving the core of the SAP Real-Time Data Platform, SAP's IMDB technology— SAP HANA — must be seen as the clear choice to drive enterprise data.

    C O N C L U S I O N

    In the coming years, there will be a growing emphasis on real-time processing as

    a key to success in the global, Internet-driven business world, regardless of what

    business one happens to be in. A critical success factor in addressing this

    requirement is memory-based DBMS technology. Many such technologies will

    emerge that emphasize, to various degrees, speed, volume, flexibility, resiliency,

    reliability, and consistency. In making strategic decisions about the future direction of

    enterprise applications, enterprises should consider the following:

      Some memory-based DBMSs will be just perfect for specific workloads within the

    enterprise. Absolute uniformity across the enterprise in this regard is not

    required. Pick the best tool for the job.

      All the data management for both operational applications and analytic workloads

    should ultimately rest on a platform that enables consistent, reliable data to be

    delivered where it is needed in a timely manner, where "timely" may range from

    days to hours to microseconds.

  • 8/18/2019 (Future) in Memory Enterprise

    11/11

    ©2012 IDC #237635 11

      Such a platform, at its heart, should have a single IMDB technology that can

    handle the speed and variety requirements of the data in the platform.

      The SAP Real-Time Data Platform, driven by SAP HANA, which offers an

    evolutionary path from a disk-based RDBMS and includes IMDB technology, is a

    possible candidate in this regard.

    C o p y r i g h t N o t i c e

    External Publication of IDC Information and Data — Any IDC information that is to be

    used in advertising, press releases, or promotional materials requires prior written

    approval from the appropriate IDC Vice President or Country Manager. A draft of the

    proposed document should accompany any such request. IDC reserves the right todeny approval of external usage for any reason.

    Copyright 2012 IDC. Reproduction without written permission is completely forbidden.