hp nonstop server virtualization primer

8/11/2019 HP NonStop Server Virtualization Primer

http://slidepdf.com/reader/full/hp-nonstop-server-virtualization-primer 1/20

HP NonStop server virtualization primer

Abstract.............................................................................................................................................. 2

Introduction......................................................................................................................................... 2

Virtualization....................................................................................................................................... 5 Provisioning services ........................................................................................................................ 6

Database services................................................................................................................................ 6 Data access managers...................................................................................................................... 6 HP NonStop SQL Database............................................................................................................... 7 Database parallel processing using partitioning................................................................................... 8 Online database administration......................................................................................................... 9

Transaction Management Facility......................................................................................................... 10

Virtualized resource ....................................................................................................................... 11 Transaction logs............................................................................................................................. 11 Database backups ......................................................................................................................... 12 Transaction commit ........................................................................................................................ 12 Transaction backout ....................................................................................................................... 13 Disk Recovery................................................................................................................................ 13 File Recovery ................................................................................................................................. 14 Disaster protection ......................................................................................................................... 15

If a failure occurs............................................................................................................................... 15

Self-management ............................................................................................................................... 16 Self-configuration ........................................................................................................................... 16 Self-optimization ............................................................................................................................ 17

Self-diagnosis ................................................................................................................................ 17 Self-healing ................................................................................................................................... 17 Self-protection................................................................................................................................ 18

Automated incident reporting and ticketing ....................................................................................... 18 Monitoring software ....................................................................................................................... 18

Enterprise management ...................................................................................................................... 19

Conclusion........................................................................................................................................ 19

For more information.......................................................................................................................... 20



AbstractHerein, you will read about clustered HP NonStop servers and the single-image virtualizationtechnologies that are designed in every product—out of the box. These technologies provide acomputing environment that allows instant takeover when a node fails and linear scalability spanninfrom two to 4,080 nodes in a single local or geographically distributed cluster. In addition, you willread about industry-leading self-management and self-healing technologies used to reduce oreliminate the administrative tasks associated with application, database, and system management.

IntroductionThe NonStop server consists of a highly virtualized, single-image, clustered environment that scaleslinearly from two to 4,080 nodes. For ease of management, the cluster is divided into segments. Eacsegment contains two to 16 nodes. Up to 255 segments can be clustered together over a high-speed/low-latency interconnect, as well as local or wide area networks.

24x7 operation with NonStop servers is enabled with completeactive redundancy throughout thearchitecture, with no single point of failure. Active redundancy differs from the more conventionalstandby redundancy in several respects. Active redundancy

• Ensures that all components are always in active use, rather than configuring half the componentssit idle in a hot standby mode

• Allows the option of sizing for degraded performance in the case of a failure and automaticallybalancing and rebalancing the workload across the cluster

• Eliminates the possibility of downtime resulting from a latent fault when a previously unusedcomponent is put into service after a failure

Many services are implemented as service pairs (two collaborating instances of a service running indifferent nodes that checkpoint state data) or as per-node services. If one service or its node fails, threquest is automatically rerouted to the remaining service, thereby fully hiding the node failure fromthe NonStop server clients for 24x7 operation. This unique and patented technology allowstransparent and instantaneous takeovers in a case of a failure, rather than complex, visible, andlengthy failovers.

Virtualization is used for every service provided in the system, including devices, the operatingsystem, the database, and application environments. This virtualization technology helps ensure thatthe NonStop server can hide all failure conditions from the end user and the application programmeIn addition, virtualization enables the NonStop server to provide load balancing on a per-transactionlevel, as well as single-image management of the cluster, the database, and application services.

High-speed and ultra-reliable messaging is an essential part of this architecture, where each node inthe cluster runs its own copy of the operating system, in a way that is similar to what occurs in bladeenvironments. As in many other clustered architectures, the nodes communicate by passing messag

but they do so at a much lower level rather than is the case with overexpensive IP layers. Themessages used in the NonStop architecture are always protected by a checksum—end to end. Thechecksum is calculated at the point of origin and checked at each communication point used todeliver the message to its end destination. At the destination, the checksum is validated and removebefore delivering the message to its consumer.

The node-to-node and segment-to-segment (up to 96 segments can be connected via fabrics, andlarger clusters use WAN technology) communication occurs over two switched, independent fabricsthat use wormhole-routing technology. Both fabrics are in use at all times when available, and traffiis automatically switched to other fabrics in use if a fabric failure occurs. Once repaired, the

2



operating system automatically validates the quality of the repaired fabric before automaticallybringing it back into use and re-enabling use of both fabrics for internode communication.

Wormhole-routing technology allows for the lowest-possible message latency in the fabrics. The fabswitches route each message, depending on the destination, before the complete message has arriveat the switch. By comparison, other store-and-forward fabrics must receive the complete messagebefore sending a message to the next stop on its journey, thereby introducing undesirable latency.

State-of-the art data management is using the Fibre Channel–based disk architecture based on RAID

(host-based disk mirroring) technology with a separate data access manager for each RAID-1 disk.Disk writes are done in parallel to both disk drives in the RAID-1 disk. Reads are done from the disdrive with a disk arm that is closer to the data to be read. All data is stored together with end-to-endchecksum information that is created and validated by the data access manager. The use of one dataaccess manager per RAID-1 disk is a key contributor to the linear scalability and availabilitycapabilities of the NonStop server. The NonStop architecture has no need for cache synchronizationor a shared resource, such as a distributed lock manager, which becomes a bottleneck in otherarchitectures as the server scales.

Transparent failure management is assured with particular types of hardware failures mapped tospecific fault zones. Some faults result in the failing resource being temporarily unavailable to theapplication but have no visible impact to the end user or the application. An example is the failure oone-half of a mirrored pair of RAID-1 disks.

Other faults result in a node being temporarily unavailable. An example is the failure of the operatinsystem instance or memory in one node. In such a failure, the node becomes unavailable while theother nodes continue processing transactions, and the workload associated with the failed node isshifted to other nodes in the cluster. This fault containment occurs for both hardware and softwarefaults.

Processing power in HP Integrity NonStop servers is provided by state-of the-art Intel® Itanium® processors. Each processor is considered a separate node, making it possible to configure a NonStopserver so that up to 50 percent of the nodes can fail without a single virtualized service failing.

Clustered operating system hides the effects of node failures, making them invisible to applications oNonStop servers. The impact of a failure on transaction processing capacity depends on theconfiguration. As a rule of thumb, the larger the load capacity, the less a failure affects the number otransactions that the server can process. Of course, enough resources need to be configured to deliveadequate performance in case of a failure.

Uncompromising data integrity is assured with each of the nodes in a NonStop server containingprivate, non-shared memory. The memory uses standard error-correcting code (ECC) protection andcontinuously scanned for latent data corruption. If uncorrectable data corruption is found, the suspecslice in the node is immediately taken out of service. When a node is taken out of service, theremaining nodes automatically and instantaneously assume the workload of the failed node. (Figuredepicts this concept.) The node is then automatically brought back into service and an automaticmemory dump is performed. The memory dump is automatically analyzed for reoccurrence by systesoftware running in another node in the cluster and a problem report is automatically generated and

sent to the HP Support Center. Advanced networking architecture used in NonStop servers provides an extended, multi-homed IPservice that is capable of having more than one IP address per cluster. In addition, each GigabitEthernet adapter supports up to four physical interfaces. Architecturally, the NonStop server cansupport hundreds of Gigabit Ethernet adapters. In addition, the NonStop architecture has the uniqueability to run multiple application service instances per IP address and port, distributing connectionrequests in a round-robin load-balancing fashion.

3



Figure 1. Workload rebalancing on failure

It is possible to configure failover between two Gigabit Ethernet adapters so that both physicalinterfaces can listen for connections on thesame IP address and the same port. These connections areprocessed on a first-come, first-served basis for incoming traffic, while a ping-pong load-balancingalgorithm is used for outgoing traffic. In this context,ping-pong refers to using two connections in analternating fashion; that is, the first package is handled by the first physical interface, the secondpackage is handled by the second physical interface, the third package is handled by the firstphysical connection, and so on.

In essence, the ping-pong technology doubles the bandwidth of the connection while helping toensure that there is always a good physical interface in case of failure. By contrast, in a standbysetup, it is possible that a takeover will encounter an alternative path that also has a failure, which isnot detected until the alternate path is needed. The result is an outage.

The round-robin load-balancing technology is fully implemented in the Gigabit Ethernet adapterhardware and is handled on a connection basis; that is, each node listening for a port connection isplaced in a queue that is serviced in a first-in, first-out manner.

The NonStop server runs (and automatically restarts if need be) an instance of the application servicin each node sharing the same IP address and port, thereby verifying that connection requests areserviced in a load-balanced manner. There is also an option to start additional sets of application

service instances listening on another port on the same IP address or a different IP address. Figure 2shows this capability.

4



Figure 2. Round-robin load-balancing connection distribution

Virtualization

As mentioned earlier, all resources in the NonStop architecture are virtualized, including the databasand application services. This means that every resource can be addressed by any node in the clusteras if the resource were local to the node. The actual location of the resource is unknown to the clienaddressing the resource. Virtualization is used to accomplish the following:

• Prevent failures from affecting the end user and the application programmer.• Remove the need for application programmers to consider clustering deployment issues. The

application programmer accesses all services using the virtual name without needing to knowwhere in the cluster the service instance is executing.

• Provide single-image management of all services. All services and service instances are managedby name (rather than by physical address), allowing the system administrator to manipulate aservice freely without having to reconfigure clients of the service.

Services are implemented using different techniques, depending on their requirements:

• A global naming service is used to locate the node in which a service instance is currentlyexecuting. Thus, a client request for service access (by name) causes the operating system to lookup the service-instance location in the global naming service and routes the request to theappropriate service instance.

5



• The operating system compensates for the failure of a service instance by updating the global namservice to reflect the location of an alternate service instance and automatically rerouting the requto the alternate service instance.

• One global naming service is associated with each node in a cluster segment. The naming serviceare kept synchronized using an atomic operation — that is, an operation that guarantees in case of afailure that all copies of the data are updated or that no copies of the data are updated.

• The naming services executing in each cluster segment cooperate to identify which service instanshould service a client request.

• System services are provided as per-node services or as cooperating services that execute inselected nodes within the cluster segment. These technologies enable instant takeover in case of anode failure. This design removes the need for service restart after a failure, and confirms that thealternate service instance has adequate state available to it.

The same technologies are used for the database and application services. For example, a virtualizedapplication service can be configured to run in some or all nodes in the cluster. When applicable,many instances of a single service can run within the same node as well as in several nodes.

Provisioning servicesThe provisioning services used in a NonStop server are configuration driven rather than script driveThe system administrator therefore focuses on defining a configuration, after which point the systemautomatically handles all provisioning tasks. Provisioning services are built in to the operating systedatabase services, and communication services.

At the application level, a dynamic provisioning system employs cluster-wide service virtualization(any service can be accessed from any node in the cluster) and transaction-level load balancing. Inessence, this provision service provides a single application image of the cluster, removing the needfor the complex and error-prone advanced scripting required with other cluster-oriented provisioninsystems. Each service is virtualized across a segment, allowing the administrator to configure howmany service instances are static (always available) and how many service instances are created ordeleted in a dynamic (demand driven) manner.

The dynamic creation of service instances is driven by load, combined with timeout-driven create adelete delays. For example, if a work request is received and no service instance is available toservice the request, a new service instance is created if no service instance becomes free within apreconfigured wait period. As each transaction is received from a client, it is dynamically assigned a service queue based on type and ordered by priority. Using queue dynamics (arrival rate/servicerate), service resources are acquired or released from nodes in the cluster. As transactions arecompleted, the service waits for new work from the queue or is signaled to terminate, freeingresources for other services on the node.

Database services

Data access managersEach RAID-1 disk in the NonStop server has its own data access manager. The data access managercontrols content, cache, and transaction-lock management directly next to the data as it comes off thdisk drive. This design delivers the highest levels of performance while guaranteeing reliability anddata integrity. (There is no need for a distributed lock manager in the NonStop server.)

The data access manager runs as a virtualized service pair distributed on two different nodes in thecluster segment, where each instance of the virtualized service has a number of local (to the node)helper services assigned to it. These two instances of the data access managers have assigned roles.One of the instances is considered the primary instance, and the other instance is considered the

6



backup. The primary instance sends checkpoint messages that contain state information to the backuinstance, helping to ensure that the backup instance can seamlessly continue the work of the dataaccess manager if the primary instance fails for any reason.

The primary instance and all helper instances share a single priority-ordered service queue to providscalable services. In addition, multiple instances can be applied to a single service request whenneeded in order to speed the execution of the request.

Note that all the processing described here is hidden from the data access manager clients; that is,

the data access manager is one of many virtualized resources within the NonStop server.

HP NonStop SQL DatabaseThe NonStop server runs an HP NonStop SQL Database (HP NonStop SQL/MX Database), whichimplements ANSI-compliant SQL. NonStop SQL provides clustered transaction and SQL databaseservices.

Logically, the SQL tables and indexes provided by NonStop SQL are single entities. In reality, thesetables can be partitioned over as many disks as needed. As in the case of clustering, databasepartitioning is completely transparent to the application.

The NonStop SQL data dictionary is an active dictionary. This means that the definitions of SQL

objects in the data dictionary always match the actual state of the objects and are consistent withNonStop server data definitions. The active dictionary technology verifies that there is never asituation in which database tables have columns of data or partitions or indexes that are notrepresented in the dictionary.

NonStop SQL guarantees that an individual SQL statement within a transaction either completessuccessfully or has no effect on the database. When a SELECT, INSERT, UPDATE, or DELETE staencounters an error, it is possible to continue the transaction. The SQL statement is rolled back, so tstatement has no effect on the database, but the transaction is not aborted. This process can occur asfollows:

• If NonStop SQL encounters an error when attempting to insert, update, or delete a single row, thenNonStop SQL returns an error. Because no change was made to the database, nothing is rolledback.

• If NonStop SQL encounters an error when attempting to insert, update, or delete multiple rows, thNonStop SQL rolls back only that statement and issues a warning in most circumstances.

All Data Manipulation Language (DML) statements executing on tables and views are performedwithin a transaction, except when reading data with a transaction isolation level of READUNCOMMITTED. If a deadlock occurs, the DML statement is canceled, and the transaction continThe NonStop server uses a timeout-based mechanism to avoid deadlocks, so that if a statement taketoo long to execute, a deadlock is assumed and the statement is canceled.

Concurrency refers to the ability of multiple service instances to access the same data at the sametime. The degree of concurrency available depends on the purpose of the access mode (read orupdate) and the isolation level. An application service instance that requests access to data currentlybeing accessed is given access or placed in a wait queue accordingly. Isolation levels (readuncommitted, read committed, repeatable read, or serializable) determine whether the active work oone transaction is visible to another active transaction.

NonStop SQL provides concurrent database access for most operations and controls database accessthrough the mechanism for locking and the mechanism for opening and closing tables. For DMLoperations, access and locking options affect the degree of concurrency.

7



Database parallel processing using partitioningThe NonStop server is able to provide superior scalability by eliminating shared software orhardware resources. Scale is achievable by processing transactions in parallel. Such processing ispossible on the NonStop server because

• The applications execute on multiple independent nodes.• The database management system executes on multiple independent nodes.• The database is stored on and accessed from multiple independent storage media that areaccessed from multiple cluster segments as a single database instance.• The database does not use a global cache or a global lock managers, thereby enabling improved

scalability—there is therefore no global resource that can become a bottleneck.

In comparison, deploying business applications on a set of UNIX® system–based application serversonly addresses part of the problem. If all database requests share a common database service,scalability is limited by the fastest platform available to host the database management system. Onecommonly used solution to such scalability problems is to partition the database based on someattribute of the transactions or the database and then route transactions to one of several independendatabase management systems.

The conventional solution for clustered database systems is to partition the database in a way thatlimits the access of one node to another node’s data partitions. This requires the applications ortransactions running on each node to access only the partition associated with the node. This, in turnmakes load balancing difficult. For example, if there are bursts of one transaction type that overloadthe node, then that node will be overloaded while the other nodes will be minimally busy. The morenodes there are, the more complex this process becomes.

To achieve adequate performance, complex partitioning and repartitioning often is necessary inreasonably scaled-out applications. In many cases, the database cannot be partitioned in such a waythat the servers can also be dynamically load balanced from the transaction-workload standpoint.

Sometimes, conventional approaches useapplication partitioning, provided that the application lendsitself to partitioning. This scale-aware programming places the partitioning burden in the applicatiodomain, which removes one of the primary advantages of a relational database: the separation ofdata manipulation from the business logic. Application partitioning is shown in figure 3.

Some of the challenges associated with application partitioning include

• The actions required to modify the application to achieve scale-out require delaying growth of theapplication, often for six months or longer, which has an adverse impact on business.

• Performance can suffer unless incoming transactions can be matched to the chosen partitioningscheme embedded in the application.

• Partitioning to avoid overlapping transactional access can be difficult.• Rolling out new transaction types can require changing the partitioning scheme embedded in the

application.• Rolling out additional instances of the database to meet increased transaction rates can require

changing the partitioning scheme embedded in the application, which can require modifying theapplication.

• Reassigning users to different instances can be constrained by the partitioning scheme embedded the application.

• Overlap can cause conflict and adversely impact application performance.

8



Figure 3. Application partitioning

With the NonStop server, this scale-out partitioning moves into the database rather than having to bimplemented in the application code. The impact is significant. Rather than waiting for applicationchanges in order to be able to grow the business, the capacity of the NonStop server can beincreased online, with no application development and no application downtime. Scale-out decisionbecome business decisions rather than decisions that are driven by technical limitations.

The NonStop server uses data partitioning that is invisible to applications (virtualized data accessmanagers), thus eliminating the need to modify application code or the application configuration asthe size of the database or the transaction rate grows (see figure 4).

Partitioning inside the database separates the problems of scaling the application from scaling thedatabase. This allows load balancing to be based on data rates and to adjust dynamically withoutimpacting the application.

The net result of data partitioning is significant when contrasted with the scale-aware programmingdescribed earlier. As can be seen in figure 4, partitioning the data rather than the application drivesthe scale. The advantages are obvious: business requirements — rather than technology — now dictate thepartitioning of databases.

Online database administrationThe advantages of data partitioning go far beyond application independence. The NonStop serverallows you to move data from one data access manager to another or to split the data across multiple

data access managers completely online while applications continue to run, including applicationsthat need write access to the data. The management of partitions is performed by databaseadministrators who use the partitioning interfaces of the NonStop server. Customers can takeadvantage of HP Database Design Services to achieve more effective partitioning.

9



Figure 4. Data partitioning

Transaction Management FacilityThe role of NonStop server Transaction Management Facility is to protect the database from theeffects of transaction failures, node and cluster failures, and media (disk drive) failures.

The Transaction Management Facility provides the following facilities to maintain data integrity dutransaction processing:

• Transaction Backout cancels database updates made by an individual transaction when the

transaction fails before completion.• Disk Recovery recovers database files to their most recent consistent state when they become

inconsistent because of a total disk (both RAID-1 disks fail) or cluster-segment failure.• File Recovery reconstructs specified database files when the current copies on the RAID-1 disk pai

are not usable.• Point-in-Time Recovery reconstructs specified database files when the current copies on the RAID-

disk pair are not usable because of a user programming error.

Because the Transaction Management Facility provides these back-out and recovery featuresautomatically, application developers do not need to write special code to handle these situations.

The Transaction Management Facility helps ensure that a group of update operations gets appliedatomically as one logical unit of work: either all operations are applied or none of them are applied.The Transaction Management Facility takes the concept of a transaction as a group of computeroperations and redefines it as a single unit of work that is recovered as a whole in the event of afailure.

For example, if a failure interrupts a money-transfer transaction in which one account has beendebited but the other account has yet to be credited, the Transaction Management Facility backs outthe change made to the source account record; that is, it cancels the new balance in the sourceaccount and restores the old balance. As a result, the database looks exactly the way it did before the

10



transaction started. The database returns to a consistent state because the source account record andthe target account record agree with each other.

When the transaction is redone and successfully completed, it puts the database in a new consistentstate, where the source account is debited by the transferred amount, and the target account iscredited by the transferred amount.

Full transaction protection is provided with the atomicity, consistency, isolation, and durability (ACproperties necessary for mission-critical applications. Transaction-isolation levels includeread

uncommitted , read committed , repeatable read (serializable), stable access, clean access, and (forpublish/subscribe) skip-conflict access(an HP database extension).

To summarize, the purpose of the Transaction Management Facility is to enable transactions totransform the database from one consistent state to another consistent state.

Virtualized resourceThe Transaction Management Facility is another example of a virtualized resource, where transactiocan be started anywhere in the cluster to be serviced by a virtualized resource that is responsible forall transaction management in the cluster segment. The Transaction Management Facility in eachcluster segment collaborate, thereby providing the view of a virtual resource that spans the wholecluster.

Each cluster segment has a single, centralized transaction log that is independent of wheretransactions originate, providing a single point of management and restoration regardless of howmany different types of application environments are running in the NonStop system. As with manyother resources in the NonStop server, a coordinating service runs in two of the nodes in the clustersegment that communicate with each other via checkpoint messages. This coordinating service thenemploys helper servers running in each of the nodes in the cluster segment, thereby providing localiof units of work and linear scalability.

Transaction logsThe Transaction Management Facility uses a single centralized transaction log that uses inherentNonStop duplication technology in each cluster segment to monitor all the changes made by atransaction. Each update to the database generates a record in the transaction log. These transactionlog files are used cyclically by the Transaction Management Facility. When a transaction log file isarchived, the file becomes available for the Transaction Management Facility to write new transactilog records into it. The Transaction Management Facility keeps track of the sequence of transactionlogs and can automatically restore specific logs, if needed.

The transaction logs are implemented as a separate group of disk files that contain information aboueach transaction against database files. The transaction log information for each transaction includesa Before Image and an After Image of each database record that was changed by the transaction.The Before Image shows the value of each field in the record before the update. The After Imageshows the value of each field in the record after the update. The number of transaction log groupsdepends on the transaction-load capacity of the specific NonStop server configuration.

Note that the transaction log for each database file is stored on one or more RAID-1 disks, whichmeans that the transaction log is stored on two disk drives that hold identical data. RAID-1 protectioof the transaction log helps ensure that the transaction log information is still available if one of thedisk drives fails.

The Transaction Management Facility uses transaction log information for two purposes:

• To undo transactions from the database when they fail before completion

11



• To reapply successful transactions to the database if the whole NonStop server fails or both drivesin a RAID-1 disk fail

But the Transaction Management Facility must do more than maintain a transaction log to keepcontrol over a transaction. The Transaction Management Facility relies on the data access managersto lock all the database records affected by the transaction for the duration of the transaction. Lockina record for a transaction simply means preventing other transactions from accessing and changingthe record until the transaction is completed.

Because of record locking, the Transaction Management Facility has a stable Before Image and AftImage of each record changed by a transaction. The Transaction Management Facility can undo orredo the database changes made by a transaction with the assurance that no other transaction couldhave altered the same records while the transaction was in progress.

Database backups A combination of Database Dumps and Log Dumps makes database recovery possible in the event o

• Corruption or loss of a database file because of an application error, operations error, or acatastrophic hardware failure

• Environmental catastrophe• Corruption of a transaction log disk

A Database Dump provides a complete, historical view of a database file while continuing to allowupdates to the file to maintain database consistency. Database Dumps are scheduled and performedautomatically.

A Log Dump involves making a backup copy of a transaction log file. A transaction log file isscheduled for dumping as soon as it is full. The file becomes available for reuse when it has beendumped. The file contains no active transactions, and no disk needs the file for Disk Recovery. (Formore information, refer to the “Disk Recovery” section later in this paper.)

A transaction log file becomes a candidate for dumping when it contains no more active transaction

The NonStop server looks for Log Dump candidates on a regular basis. Multiple log files are used inround-robin manner to allow continual processing.

Transaction commitThree things occur while a transaction is in progress:

• The Transaction Management Facility causes the data access managers to lock all records, basedon defined transaction isolation levels, affected by the transaction.

• The transaction performs updates by changing the values of fields in existing records, deletingexisting records, or adding new records.

• The Transaction Management Facility generates transaction log information consisting of a BeforImage and an After Image of each updated record.

But what happens at the conclusion of a successful transaction? What occurs inside the database? Inother words, how does transaction commit work?

Transaction commit is based on an embedded, two-phase commit architecture. In a two-phase commarchitecture, the transaction is committed only after all the preceding database operations within thetransaction have updated records. The transaction-commit processing involves a number of steps:

12



• The Transaction Management Facility sends a message to the data access manager for thetransaction log disk, telling it to move the transaction log data that has been building up in mainmemory buffers to the transaction log files on disk.

• After the data access manager has written the transaction log data to disk, the TransactionManagement Facility writes a transaction commit record in the transaction log. The transaction loare stored safely on RAID-1 disks. If a server failure or a failure of a database disk occurs after thpoint, the entire transaction can be reconstructed from the transaction log file.

• After writing the transaction commit record, the Transaction Management Facility sends a messagto each participating data access manager, telling it to release the locks held for the transaction.

• Each data access manager releases its locks.

What has really happened to the database now that the transaction has been successfully completed? After the transaction commit, all the changed records are changed permanently. The TransactionManagement Facility can no longer undo the changes. Another transaction can now access thechanged records and change them in its turn.

Transaction backout What happens when a transaction cannot complete all its changes successfully? What does theTransaction Management Facility do then? Transaction-backout processing takes place when a failuoccurs or when an application or operator request is made to undo all of the updates performed sofar in this transaction.

The Transaction Management Facility responds by accessing the transaction log, locating theinformation for the transaction (the Before Images and After Images of the records changed by thetransaction), and applying the Before Image of each record to the database.

Disk RecoveryThe Transaction Management Facility is very effective in backing out individual transactions that hfailed, but it sometimes must deal with a more serious problem: the failure of many transactions at tsame time.

Various types of hardware failures can cause the failure of multiple transactions. For example, if anuninterruptible power supply (UPS) is not installed, a city-wide power outage could cause a complecluster-segment failure, together with the failure of all current transactions in the segment.

The segment’s Transaction Management Facility will respond to these relatively unlikely failures wrecovery procedure known asDisk Recovery.

Suppose that an extended power failure brings down the segment. In such a case, all transactionsthat are updating the database at the time of the failure must be backed out or redone:

• Transactions that were not completed or had not committed their changes at the time of failure mube backed out.

• Transactions that committed their changes before the failure must be redone.

It makes sense that incomplete or uncommitted transactions have to be backed out. But why do thecommitted transactions need to be reapplied?

The reason is that the segment does not write changed database records to disk as soon as they areupdated. Instead the segment allows a number of changed records to accumulate in cache (a high-speed area of main memory) before writing the whole set of records to disk in a single physicalupdate operation.

13



This write-in-cache procedure significantly reduces the number of writes to database files and keepsperformance levels high. The only problem is that if the segment fails, some changed records in cacmight not yet be changed on disk.

The Transaction Management Facility solves this potential data integrity problem by redoingcommitted transactions after a failure, to verify that the database changes made by such transactionsare reflected on disk.

After the segment comes back up, the Transaction Management Facility automatically initiates Disk

Recovery to redo the committed transactions and then undo any incomplete transactions.Disk Recovery uses the After Images in the transaction log to redo committed transactions. But howdoes Disk Recovery determine which transactions need to be redone? It limits the amount of work bperiodically requiring the data access manager to perform a routine known ascontrol-pointprocessing.

As discussed earlier, the disk access manager manages physical updates to a disk. When calledupon to perform control-point processing, the disk access manager writes to disk the changed recordthat have accumulated in cache. The disk access manager then informs the Transaction ManagemenFacility of a new “redo location” in the transaction log, where Disk Recovery must begin.

When it is time for Disk Recovery to apply the After Images of committed transactions to the databDisk Recovery needs to read only the portion of the transaction log that follows the most recent redolocation. This redo processing can be done safely because applying the same transaction a secondtime will yield the same result. Therefore, the redo processing is either causing the database toprocess the same transaction a second time or process a transaction for the first time. Regardless, theresult is that the redo processing causes the database to be updated with all transactions that had notbeen committed yet.

After redoing the successful transactions, Disk Recovery goes through the transaction log again andbacks out any incomplete transactions by applying before images to the affected records on disk.

The advantages of Disk Recovery are that

• The database is restored to a consistent state within minutes because the use of disk accessmanager control points and redo locations limits the number of transactions that must be redone.

• If necessary, the Transaction Management Facility starts Disk Recovery automatically when theNonStop server comes back up.

In summary, Disk Recovery performs database recovery in two steps. It• Applies After Images to redo committed transactions• Applies Before Images to undo any incomplete transactions

The end result of both steps is the restoration of the database to a consistent state.

File RecoveryDisk Recovery provides a quick, handy solution to the problem of multiple transaction failures resufrom a hardware failure. But in certain cases, Disk Recovery cannot be used to recover from theeffects of a hardware failure. For example, the failure of both disk drives in a RAID-1 disk canpotentially destroy the contents of database files and make the files unusable.

Why can’t Disk Recovery be used to recover from such failures? Remember that Disk Recovery usethe current version of the database on disk as the starting point of recovery. It reapplies successfultransactions to this database and backs out unsuccessful transactions. If the disks containing thecurrent version of the database are damaged, Disk Recovery has nothing to work with. Instead, File

14



Recovery must be used, which involves the use of Database Dumps and transaction logs to recover damaged database file, and includes the following steps:

• The database file is restored from the latest available Database Dump.• File Recovery brings the archive copy of the database up to date (up to a specific time stamp, if so

desired) using the After Images found in the transaction log files. File Recovery thereby restores tdatabase to the state it was in at the time of the catastrophic failure.

Recovery time depends on whether all the transaction log files are on disk or whether some of themhave to be restored from backup media. File Recovery copies records from the transaction log to thedatabase file being restored from the Database Dump. File Recovery is very fast. However, the totaltime for the File Recovery operation depends on how many Log Dumps have been taken since the lDatabase Dump, as well as on the size of the Log Dumps.

Disaster protectionThe NonStop server provides customers with the option of purchasing both synchronous andasynchronous disaster protection by using a secondary NonStop server at a remote site. The distancelimitation for the synchronous disaster protection is currently 100 kilometers (62 miles). Theasynchronous disaster protection has no distance limitations.

Disaster protection for the NonStop server provides transactional consistency at all times, while hidthe replication fully from the application. Enterprise storage replication, by comparison, does bytereplication and therefore cannot make such guarantees.

Synchronous disaster protection helps ensure that the transaction has been replicated to the backupsite before the NonStop server replies to the application. This type of protection verifies that alltransactions are available on the backup NonStop server if a disaster occurs. From an applicationperspective, this technology yields guaranteed data availability at a shorter distance than can be usedfor asynchronous disaster protection. Synchronous disaster protection is based on accessibility of atransaction log copy to another system, so no explicit writes to another system need to occur beforethe reply is sent to the application.

Asynchronous disaster protection replicates transactions after the reply has been sent to theapplication, thereby providing any-distance disaster protection.

Regardless of whether disaster protection is synchronous or asynchronous, there is no effect ontransaction response time. In addition, HP’s Zero Lost Transaction (ZLT) provides for the best of boworlds: asynchronous response time with no loss of committed transactions.

If a failure occurs What happens if a failure does occur? What is the visibility to the application client, and what doesthe application client have to do?

Because of the active redundancy of all software and hardware used in the NonStop server, mostfailures are fully masked from the application client. For example, the failure of one of the disk drivin the RAID-1 disk is masked by the data access manager, and therefore does not affect theapplication client at all.

The only time that a failure is visible to the application client is when the node in which theapplication service instance is running fails. In such cases, an error is returned to the applicationclient, and a Transaction Backout is performed automatically. However, all the remaining nodes in tNonStop server contain other application service instances. Because transactions are not nodespecific, this means that the application client or whatever access method is being used (e.g., Java™

15



Database Connectivity [JDBC] or Open Database Connectivity [ODBC]) has only to retry thetransaction in order for it to succeed.

Self-managementThe NonStop server uses industry-leading self-management features in the following categories:

• Self-configuration: The configuration of components is automated, and nodes follow high-levelpolicies. The rest of the cluster adjusts automatically and seamlessly.

• Self-optimization: Components and nodes continually seek to improve their own performance andefficiency.

• Self-diagnosis: The server can detect and diagnose its own problems.• Self-healing: The server automatically repairs localized software and hardware problems, in some

cases also automatically reintegrating the repaired resource back into itself.• Self-protection: The server automatically defends against malicious attacks and cascading failures.

anticipates and prevents server-wide failures.

To provide these features, the NonStop server uses the following interdependent techniques:

• Clustering of relatively autonomous processors: A NonStop server consists of two to 4,080 nodes,configured in a shared-nothing cluster; that is, there is no centralized coordination node, as in somother cluster solutions.

• Server self-management: The server is capable of automated configuration changes, optimization,diagnosis, repair, and protection. Such capabilities are natural for a server that is designed fromthe ground up to be tolerant of single faults.

• Resource virtualization: From an application perspective, every resource in the NonStop server isvirtualized—no aspects of software or hardware redundancy are made visible to the client.

Virtualizing resources provides transparent server self-management that is hidden from the clients

In other words, to deliver the highest possible server availability, a clustered solution is needed. To

deliver such a solution at the lowest possible total cost of ownership (TCO), self-managementtechniques are needed. To deliver transparent server self-management, cluster resources must bevirtualized from an application perspective, allowing automatic changes to the computingenvironment without forcing the application to implement self-management techniques.

Over the last 30 years, HP has developed many self-management technologies and continues to makadvances in self-configuration, self-optimization, self-diagnosis, self-healing, and self-protection. Athese technologies are included in the NonStop server. Following are examples of such technologiesthat complement other examples provided earlier in this paper.

Self-configuration•

Automated server reconfiguration on expansion or reduction (server resizing): Processors anddevices can be added to or removed from the server online, with the server adjusting itsconfiguration automatically.

• Automated configuration of adapters and disk drives: An adapter or a disk drive added to theserver is automatically configured and started. If a disk drive is repaired, an online data copy isstarted automatically to bring the RAID-1 disk back to a synchronized state while the applicationclient can continue to access the database in the NonStop server. Once the data copy is complete,the server automatically begins using both disk drives for reads and writes again, without theapplication client noticing that anything is amiss.

16



Self-optimization• Multinode port sharing: The NonStop server allows multiple application service instances within a

cluster segment to share the same IP address and port, providing round-robin load balancing on aconnection basis.

• Load-dependent disk data copy: When performing an online copy to bring a RAID-1 disk pair bacinto a synchronized state, the data access manager automatically varies how fast and how muchdata to copy, depending on the server load.

• Mixed-workload support: The NonStop server allows online transaction processing (OLTP) anddecision support queries to be processed concurrently without degradation, which simplifies dataadministration and design (there’s no need to synchronize separate databases). It does this byassigning priorities to individual workloads based on– How urgent the need is for the information– Cost as it relates to I/O and processor cycles

The architecture then dynamically monitors these competitive priorities to meet the response goalhigher-priority workloads.

• Database query enhancement: The HP Database Engine enhances queries across the entire clusterThe database engine takes into account the number of nodes in the cluster, which data access

managers need to participate, the partitioning of data, and the query itself when optimizing aquery.

Self-diagnosis• Detection of latent failure: Alternate paths to devices and processors are used either in an

alternating fashion (e.g., the server switches between available paths on a predetermined timeinterval) or checked periodically. Data in memory and on disks is checked periodically (datascrubbing). This technology and other implementations of latent-failure technology allow HP to kwhen a component can be replaced or upgraded safely; that is, the system validates that a backupresource can take over work when the primary resource is removed from the server.

• Background quality scans of data: Data on disks is scanned on a scheduled basis to detect andrepair latent failures. This feature is especially important to detect problems in data that is rarelytouched, as the checksums for such data are normally not reevaluated when data is at rest.

• Incident analysis with automated data collection: Based on a highly structured common-eventsystem, incident-analysis software is able to diagnose more than 90 percent of all hardware failureautomatically with close to 95 percent accuracy. Data needed for problem analysis is collectedautomatically and sent to the HP Global Customer Support Center (HP Support Center).

Self-healing• Service pairs and per-node services: Based on resource virtualization, many services are

implemented as service pairs (two collaborating instances of a service running in different nodes

that checkpoint state data) or as a per-node service. If one service or its node fails, the request isautomatically rerouted to the remaining service, thereby fully hiding the node failure from theNonStop server clients.

• Persistent services: All services are protected by a Service Persistence Manager, which restarts theservice automatically if it fails. For example, if an operating system service instance fails, it isautomatically restarted.

• Automated data repair: If the disk data-scrubbing software detects a data error, the incorrect datais repaired from the second disk drive in the RAID-1 disk pair.

17



• Automated reinstatement of repaired hardware with sanity checks: Repaired hardware inserted intothe cluster is detected automatically. For some types of hardware, a sanity check is performed (e.gby sending test packages) before the hardware is fully reinstated.

Self-protection• End-to-end data checksums: All messages in the server are checked from beginning to end. All

system data buffers are protected with buffer tags.•

Fail-fast technology: If a checksum error is encountered, the action is retried. If a node detects anonrecoverable error, it halts itself to provide data integrity.

Automated incident reporting and ticketingThe NonStop server uses the HP Internet Services Enterprise Edition (ISEE) product, which enablesremote service of HP equipment.

Within the server, monitoring software reacts to and analyzes any problem that cannot be correctedwith the self-management technologies described previously. If a problem is identified, an IncidentReport is created and sent through ISEE to the global HP trouble-ticketing system. Once an IncidenReport is received, the HP trouble-ticketing system automatically creates a trouble ticket and assignto an HP support specialist, who can obtain information about the problem from the data contained ithe Incident Report and connect to the customer server using ISEE.Once connected to the server, the HP support specialist has access to a number of different tools todetermine whether a component needs to be replaced, a reconfiguration is needed, a softwareupdate is needed, and so on. In addition, the HP support specialist has access to advanced searchcapabilities in the trouble-ticket system and a vast knowledge database. Finally, second- and third-level support is available, combined with a developer community that is highly trained in reacting tocritical customer problems.

If an HP customer engineer needs to be deployed to replace a hardware component, automaticallygenerated guided procedures assist the HP customer engineer in replacing the hardware andbringing the system back to more efficient service.

Monitoring softwareNonStop server monitoring software employs both reactive and proactive techniques.

Reactive monitoring is based on a highly structured common event service used by all services runnin the NonStop server. This is combined with analysis software that determines whether the creationan Incident Report and subsequent dial-out to the HP Support Center is warranted. For hardwarefailures, the analysis software determines what the root cause of the problem is and what repairaction should be used to fix the problem.

For node failures, regardless of whether they are caused by a hardware problem or by the software-based data-consistency methods described earlier, a reload is performed. After the reload, a memory

dump is performed automatically. After the memory dump is analyzed, failure-analysis softwaredetermines whether the problem has been seen before. In all cases, an Incident Report is generatedand sent to the HP Support Center. The Incident Report indicates if the memory dump shows arecurrence of a problem.

It should be noted that the Incident Report system will not send a report for an event about the sameresource (e.g., a disk file) more than once every eight hours. This helps ensure that the HP SupportCenter is not overwhelmed with new trouble tickets. Also, if many problems occur at the same timethe cluster, only one Incident Report is sent to the HP Support Center, consolidating the problemreports.

18



19

In addition to reactive monitoring, the NonStop server also uses proactive monitoring on a schedulebasis to, for example, help ensure that the NonStop server is running according to its preferredconfiguration. If not, an event is generated that causes an Incident Report to be created, as describedpreviously. For the base service, this proactive monitoring covers basic system entities (runningservices, working connections, etc.). Add-on services can provide proactive monitoring of entities sas disk usage, performance, and capacity.

Enterprise managementThe NonStop server can be monitored with any enterprise management tool that supports SimpleNetwork Management Protocol (SNMP) or DMTF Web-Based Enterprise Management (WBEM). Fexample, HP OpenView and HP System Insight Manager Software both detect the existence of theNonStop server and show the alarms that it generates. Of course, customers can also use otherenterprise management tools, such as BMC Control, CA Unicenter, or IBM Tivoli.

It is also possible to add tools that provide more advanced monitoring of the NonStop server and thaintegrate well with the enterprise management tools. These types of tools are available as add-onoptions for the NonStop server.

ConclusionThe NonStop server consists of a virtualized, single-image, clustered environment that scales linearfrom two to 4,080 nodes. The NonStop server provides completeactive redundancy throughout thearchitecture, with no single point of failure. Virtualization is used for every service provided in thesystem, including devices, the operating system, the database, and application environments. Self-management technologies, such as automated configuration changes, optimization, diagnosis, repairand protection, provide an easy-to-manage, single view of the cluster environment. Such capabilitieare natural for a server that is designed from the ground up to be tolerant of hardware and softwarefaults.

Best of all, customers don’t need to separately buy, configure, tune, and manage multiple, complexclustering technologies (e.g., database clusters, application clusters, operating system clusters) as thdo with other systems. With NonStop servers, it’s easy and inexpensive to build and operate largeapplications and databases that need to provide uncompromized data integrity and 24x7 operationday in and day out, year after year, all the time, every time.



For more information www.hp.com/go/nonstop

© 2006 Hewlett-Packard Development Company, L.P. The information containedherein is subject to change without notice. The only warranties for HP products andservices are set forth in the express warranty statements accompanying suchproducts and services. Nothing herein should be construed as constituting anadditional warranty. HP shall not be liable for technical or editorial errors oromissions contained herein.

Intel and Itanium are trademarks or registered trademarks of Intel Corporation or itssubsidiaries in the United States and other countries. Java is a US trademark of SunMicrosystems, Inc. UNIX is a registered trademark of The Open Group.

4AA0-3156ENW, February 2006

http://www.hp.com/go/nonstop

http://www.hp.com/go/nonstop

hp nonstop server virtualization primer

Documents