7-1-1 integration server clustering guide

Title Page

webMethods Integration ServerClustering Guide

Version 7.1.1

January 2008

webMethods

Copyright& Docu‐ment ID
This document applies to webMethods Integration Server Version 7.1.1 and to all subsequent releases.

Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions.

© Copyright Software AG 2008.

All rights reserved.

The name Software AG and/or all Software AG product names are either trademarks or registered trademarks of Software AG. Other company and product names mentioned herein may be trademarks of their respective owners.

Document ID: IS-CL-711-20080128

Table of Contents

About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1. An Overview of Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7What Is Clustering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8What Data Is Shared in a Cluster? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Overview of Failover Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Overview of Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Guaranteed Delivery and Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Checkpoint Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Putting it All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

webMethods Integration Server Session Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Scheduled Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Client Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2. Installation and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Considerations for Using webMethods Trading Networks in a Clustered Environment 16Considerations for Using webMethods Adapters in a Clustered Environment . . . . . . . 16

Setting Up the webMethods Integration Servers in the Cluster . . . . . . . . . . . . . . . . . . . . . . 17Server Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Switching from the Embedded Database to an External RBMS . . . . . . . . . . . . . . . . . . 18Configuring the Server to Use Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Enabling Clustering and Specifying Clustering Parameters . . . . . . . . . . . . . . . . . 19Scheduling Jobs to Run in the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Using Guaranteed Delivery with Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3. Managing Server Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Viewing the Servers in the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Removing a Server from the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Adding a Server Back into the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Taking a Server Offline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Bringing a Server Back Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4. Building Checkpoint Restart into Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Checkpoint/Restart Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Sample Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A. Short-Term Store Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Usage Tips for the pub.storage Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

webMethods Integration Server Clustering Guide Version 7.1.1 3

Table of Contents

4 webMethods Integration Server Clustering Guide Version 7.1.1

About This Guide

This guide describes how to install and configure the webMethods Integration Server Clustering feature. It contains information for administrators who configure and manage a webMethods Integration Server and for application developers who want to create services that interact directly with the Integration Server short‐term store.

To use this guide effectively, you should:

Understand the basic concepts described in the webMethods Integration Server Administrator’s Guide and the webMethods Developer User’s Guide.

Be familiar with all the servers you want to include in the cluster.

Document Conventions

Convention Description

Bold Identifies elements on a screen.

Italic Identifies variable information that you must supply or change based on your specific situation or environment. Identifies terms the first time they are defined in text. Also identifies service input and output variables.

Narrow font Identifies storage locations for services on the webMethods Integration Server using the convention folder.subfolder:service.

Typewriter font

Identifies characters and values that you must type exactly or messages that the system displays on the console.

UPPERCASE Identifies keyboard keys. Keys that you must press simultaneously are joined with the “+” symbol.

\ Directory paths use the “\” directory delimiter unless the subject is UNIX‐specific.

[ ] Optional keywords or values are enclosed in [ ]. Do not type the [ ] symbols in your own code.


About This Guide

Additional InformationThe webMethods Advantage Web site at http://advantage.webmethods.com provides you with important sources of information about webMethods products:

Troubleshooting Information. The webMethods Knowledge Base provides troubleshooting information for many webMethods products.

Documentation Feedback. To provide feedback on webMethods documentation, go to the Documentation Feedback Form on the webMethods Bookshelf.

Additional Documentation. Starting with 7.0, you have the option of downloading the documentation during product installation to a single directory called “_documentation,” located by default under the webMethods installation directory. In addition, you can find documentation for all webMethods products on the webMethods Bookshelf.


http://advantage.webmethods.com

http://advantage.webmethods.com/kb_top

http://advantage.webmethods.com/docfeedback

http://advantage.webmethods.com/Bookshelf

http://advantage.webmethods.com/Bookshelf

1 An Overview of Clustering

What Is Clustering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

What Data Is Shared in a Cluster? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Overview of Failover Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Overview of Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

webMethods Integration Server Session Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Scheduled Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Client Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13



What Is Clustering?Clustering is an advanced feature of the webMethods product suite that substantially extends the reliability, availability, and scalability of the webMethods Integration Server. Clustering accomplishes this by providing the infrastructure and tools to deploy multiple Integration Servers as if they were a single virtual server and to deliver applications that leverage that architecture.

Scalability—Without clustering, only vertical scalability is possible. That is, increased capacity requirements can only be met by deploying on larger, more powerful machines, typically housing multiple CPUs. Clustering provides horizontal scalability, which allows virtually limitless expansion of capacity by simply adding more machines of the same or similar capacity.

Availability—Without clustering—even with expensive Fault‐Tolerant systems—a failure of the system (hardware, java runtime, or software) may result in unacceptable downtime. Clustering provides virtually uninterrupted availability by deploying applications on multiple Integration Servers; in the worst case, a server failure produces degraded but not disrupted service.

Reliability—Unlike a server farm (an independent set of servers), clustering provides the reliability required for mission‐critical applications. Distributed applications must address network, hardware, and software errors that might produce duplicate (or failed) transactions. Clustering makes it possible to deliver “exactly once” execution as well as checkpoint/restart functionality for critical operations.

The following diagram shows clustering in its simplest form:

Load balancing is an optimizing feature you use with clustered webMethods Integration Servers. Load balancing controls how requests are distributed to the servers in the cluster. You must use a third‐party load balancer to perform load balancing. A third‐party load balancer can be useful if you need load balancing for multiple types of servers, for example, Web servers and application servers, in addition to webMethods Integration

Note: webMethods Integration Server clustering redirects HTTP and HTTPS requests only; it does not redirect FTP, FTPS, or SMTP requests.

Integration Server

ExternalClient

Integration Server

Firewall

Cluster of Integration Servers

Distributed Cache



Servers. Third‐party load balancers also offer virtual IP support, but they are not “out of the box.” Most third‐party load balancers perform load balancing in a round‐robin manner or based on network level metrics such as TCP connections and network response time.

What Data Is Shared in a Cluster?In a clustered configuration, the session state for clients connected to all servers in the cluster is stored in the shared distributed cache created by Oracle Coherence. The cache allows transactions in a conversation to be continued on other servers in the cluster. In addition TN query results and ART polling notifications are stored in the distributed cache. For more detailed information about setting these parameters, refer to http://wiki.tangosol.com/display/COH32UG/Coherence+3.2+Home.

Information about scheduled jobs, jobs tracked by guaranteed delivery, and XREF data is stored in an external database, identified by the ISInternal functional alias. In addition, JOIN documents (ANDs and XORs only) and the processing state of activations that are participating in a JOIN are also stored in the database. For more information about these transient logs, see the Publish‐Subscribe Developer’s Guide.

Overview of Failover SupportFailover support allows you to recover from system failures that occur during processing, making your applications more robust. For example, by having more than one Integration Server, you protect your application from failure in case one of the servers becomes unavailable. If the webMethods Integration Server to which the client is connected fails, the client automatically reconnects to another webMethods Integration Server in the cluster.

You can achieve failover support a number of ways:

webMethods Integration Server clustering—With multiple webMethods Integration Servers, if one Integration Server fails, requests can be automatically redirected to another server, thereby avoiding a single point of failure.

Redundant hardware configurations—By specifying hardware configurations that are redundant, for example multiple machines, multiple file servers, and so on, you further reduce the chance of a single point of failure.

Important! Do not perform development work in a clustered environment. Basic namespace locking and the WmVCS package are not supported across Integration Servers in a cluster.

Note: webMethods Integration Server clustering provides failover capabilities to clients that implement the webMethods Context and TContext classes. Integration Server does not provide failover capabilities when a generic HTTP client, such as a Web browser, is used.


http://wiki.tangosol.com/display/COH32UG/Coherence+3.2+Home


Overview of ReliabilitySeveral features of the webMethods Integration Server improve reliability in a clustered or unclustered environment.

Guaranteed Delivery—This feature ensures that a service executes once and only once. It is particularly useful when used with clustering to prevent a restarted service from running on more than one server. This feature is only for use with server‐to‐server communications.

Checkpoint restart services—These services allow you to code your flow service to store state information and other pertinent information in the short‐term data store. If the flow service fails because a server becomes unavailable, the flow service can be restarted from the last checkpoint rather than at the beginning.

Guaranteed Delivery and ClusteringGuaranteed delivery ensures one‐time execution of services by guaranteeing the following:

Requests from the client to execute services are delivered to the server.

Services are executed once, and only once.

Responses from the execution of services are delivered to the client.

Guaranteed delivery is useful with or without clustering. If you are not using clustering, guaranteed delivery makes sure a client resubmits a service request until it succeeds and a response is returned. Guaranteed delivery also makes sure the service executes only once. For example, if the network connection between the client and the webMethods Integration Server fails after execution but before the response is successfully redirected to the client, the service might be executed twice. Guaranteed delivery prevents this from happening.

With clustering, if the server on which the service is running becomes unavailable, the client retries the service on another server in the cluster. If the request fails against all servers in the cluster, you must use guaranteed delivery to guarantee execution. As in an unclustered environment, you must use guaranteed delivery to prevent a service from executing more than once.

Checkpoint RestartwebMethods Integration Server provides a number of built‐in services (in the pub.storage folder) that you can use to make your flows more robust. With these services, you can write state information and other pertinent data to a data store in the Integration Server short‐term data store. If the webMethods Integration Server on which your flow is executing becomes unavailable, when your flow restarts it can check the state information in the data store and begin processing at the point where the flow was interrupted.



Putting it All Together This table summarizes how checkpoint restart, clustering, and guaranteed delivery work together to provide availability, failover, and reliability in different situations.

Checkpoint restart specified

Clustering specified

Guaranteed Delivery specified

If the server on which the service is running becomes unavailable…

Point at which the service restarts

After the server becomes available, you must manually restart the service.

At the beginning

After the server becomes available, the client automatically restarts the service. The client keeps trying to run the service until it runs successfully.

At the beginning

The client automatically retries the service on another server in the cluster. If that server fails, the client retries the service on the next server in the cluster, and so on. If all attempts fail, you must manually restart the service.

At the beginning

The client retries the service on the next server in the cluster. If that server fails, the client retries the service on the next server in the cluster, and so on. The client continues to try running the service until it runs successfully.

At the beginning

When the server becomes available again, you must manually restart the service.

Flow services: At the specified checkpoint

Other services: At the beginning



webMethods Integration Server Session ObjectsThe clustered Integration Servers create and maintain the session objects that are stored in the cache.

Servers maintain the session information in their own local memory. In a cluster, a server will create a session in the distributed (or shared) cache, so that when a server redirects a client for failover, the new server can access the session information.

The server that initially receives a request from a client creates the session object in the cache. Other servers in the cluster can access the session object to access and update session information as necessary.

When you configure a server to use clustering, you specify a setting that indicates how long inactive session objects are maintained in the cache. Periodically, each server in the

When the server becomes available again, the client automatically retries the service. The client continues trying to run the service until it completes successfully.



The client automatically retries the service on the next server in the cluster. If that server fails, the client retries the service on the next server in the cluster, and so on. If attempts on all servers fail, you must restart the service manually.



The client automatically retries the service on the next server in the cluster. If that server fails, the client retries the service on the next server in the cluster, and so on. The client keeps trying until the service runs successfully.


Other services: At the beginning.

Checkpoint restart specified

Clustering specified

Guaranteed Delivery specified

If the server on which the service is running becomes unavailable…

Point at which the service restarts



cluster checks the session objects in the cache to determine if any have expired, and if so, removes them.

Scheduled JobsYou can schedule jobs to run on one, any, or all Integration Servers in the cluster. Integration Server stores information about scheduled jobs in the external database identified by the ISInternal functional alias. For jobs to run in the cluster, the server must be enabled for clustering and existing jobs must be flagged to run in the cluster. For instructions, see the chapter about managing services in the webMethods Integration Server Administrator’s Guide.

Client ApplicationsServer clustering is almost transparent to the client. A client can issue requests to a server that is in a clustered environment in the same way it issues a request to a server that is not in a clustered environment.

When a client connects to a server that is in a clustering environment (using the Context class), the server returns information about the other servers in the cluster to the client. In the event that a request is not processed, the client can use this information to connect to another server in the cluster to have the request fulfilled.

Whether you are using the Context or TContext class to communicate with the webMethods Integration Server, the redirection of failed requests is transparent to the client. The logic to handle redirection is in the webMethods Integration Server Context or TContext class.

To improve the failover capability, before your client calls Context or TContext to connect to an Integration Server in the cluster, have your client issue the setRetryServer method in that class to specify another server to try if the first server the client tries to connect to is unavailable.

No special processing is required in your clients.

Note: webMethods Integration Server clustering provides failover capabilities to HTTP‐based webMethods clients, such as those clients built using the webMethods Context and TContext classes.


2 Installation and Setup

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Setting Up the webMethods Integration Servers in the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Using Guaranteed Delivery with Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21



Overview This section summarizes the steps you must take to set up server clustering:

Ensure you have the required software. Each Integration Server in the cluster must be release 7.1 or later.

Install the webMethods Integration Servers and configure them for clustering. A webMethods Integration Server must be installed and running on each machine you want to include in the cluster. In addition, each webMethods Integration Server must be configured for clustering. Instructions are provided in “Setting Up the webMethods Integration Servers in the Cluster” on page 17.

Configure network routers to support multicast traffic. Integration Server uses Oracle Coherence to create a distributed (or shared) cache for storing volatile data. This caching software uses the multicast method of discovery. As a result, if your cluster will consist of webMethods Integration Servers that reside on different network segments, you must configure the routers for those segments to handle multicast traffic. Consult your IT department if you need help configuring the routers.

Considerations for Using webMethods Trading Networks in a Clustered EnvironmentFor Trading Networks to run in a clustered environment, you must configure Trading Networks so that and user accounts, data cached in memory, and Integration Server properties are synchronized on all servers in the cluster. You control these synchronizations by setting Trading Networks system properties. For more information, see the webMethods Trading Networks Administrator’s Guide.

In addition, all Trading Networks instances must use the same database.

Considerations for Using webMethods Adapters in a Clustered EnvironmentFor an adapter to run in a clustered environment, a duplicate version of it must exist on each machine in the cluster. Maintaining duplicate versions of an adapter requires manual tasks. For current cluster configuration information for adapters, see the documentation provided with each adapter.



Setting Up the webMethods Integration Servers in the ClusterIt is recommended that you maintain the same environment for all servers in a cluster, for example, the same packages of services and ACLs that protect services. For information on the server environment variables that you should maintain, refer to “Server Environment” below.

You can have a variety of webMethods Integration Servers in your cluster, for example one server on HP‐UX, one on Solaris, and one on NT.

Server EnvironmentAlthough not required, it is recommended that the servers in a cluster each have the same server environment. For server clustering to work effectively, you should keep the following the same on all servers in the cluster:

All servers should have the same packages of services. In order for a service to execute on any server in the cluster, that service must exist in the same package on every server in the cluster.

Each server should have the same user accounts. The server uses user account information to authenticate clients. When a server redirects a request to an alternate server, the alternate server re‐authenticates the user using the credentials supplied to the original server. If authentication fails, the request fails.

Access to services should be the same on all servers. The server uses group information and ACLs to determine whether a client has access to a service. All servers should have the same groups with the same group membership. Services should be protected by the same ACLs. The ACLs should identify the same Allow Groups and Deny Groups.

If a server determines that a client is allowed to access a service, it assumes the other servers in the cluster will also allow access and will redirect the request. If it is redirected to a server that denies access, the request fails.

Each server should have the same event subscriptions. The server saves information for event types and event subscriptions in the eventcfg.bin file. This file is generated the first time you start a webMethods Integration Server and is located in the following directory: IntegrationServer_directory\config. Copy this file from one server to another to duplicate event subscriptions on all servers in the cluster.

Clock time should be the same on all servers. The clocks on all machines in the Integration Server cluster must be synchronized for scheduled jobs to work properly in a cluster.

Note: It is recommended that you use the package replication functionality in the Integration Server Administrator to copy packages to other servers in the cluster, instead of using Developer to copy them. For information about package replication, see the webMethods Integration Server Administrator’s Guide.



Each server should connect to the same broker. All servers must connect to the same Broker and present the same client prefix for the Broker.

Each server should connect to the same set of databases. For example, if you are using the audit database to store audit data, all servers must connect to the same audit database. If you are using a back‐end database for application‐related data, all servers must connect to the same back‐end database.

Each server should have its own diagnostic port. If you plan to troubleshoot using the diagnostic port, you must configure a diagnostic port for each server in the cluster. The diagnostic port can access only the Integration Server on which it is defined. For more information about the diagnostic port, see the webMethods Integration Server Administrator’s Guide.

Switching from the Embedded Database to an External RBMSFor Integration Servers to participate in a cluster, they must share database components in an external RDBMS. If you installed an Integration Server with the embedded database, but now want to add it to a cluster, you must switch the Integration Server to use the shared external RDBMS.

Follow these steps:

1 In Integration Server Administrator, navigate to the Security > Certificates > Client Certificates screen, and make a note of the certificate mappings.

If the cluster already exists, create JDBC connection pools for the IS Internal and Cross Reference database components, and point the ISInternal and Xref functions at the shared IS Internal and Cross Reference database components. See the webMethods Installation Guide for instructions.

If those database components do not yet exist, create them and connect all Integration Servers in the cluster to them.

2 Run the migration utility pub.scheduler:migrateTasksToJDBC to migrate your scheduled tasks from the embedded database to the external RDBMS. See the webMethods Integration Server Built‐In Services Reference for instructions.

3 In Integration Server Administrator, navigate to the Security > Certificates > Client Certificates screen and re‐specify your certificate mappings. For instructions on mapping a client certificate to a user, refer to webMethods Integration Server Administrator’s Guide.

4 Configure Integration Server as described below.

Note: This service migrates scheduled tasks only; client mappings and run‐time data will not be migrated.



Configuring the Server to Use ClusteringAfter you ensure that a server has the appropriate environment, you can add it to the cluster by configuring the server to use clustering.

Many of the parameters you will specify to configure clustering are used by the caching software, Oracle Coherence. For more detailed information about setting these parameters, refer to http://wiki.tangosol.com/display/COH32UG/Coherence+3.2+Home.

Enabling Clustering and Specifying Clustering ParametersUse the following procedure to turn clustering on for this Integration Server and to specify various clustering parameters:

1 In the Settings menu of the navigation area, click Clustering.

2 Click Edit Cluster Settings.

3 Click Enable Cluster.

4 Specify the following information:

Important! To execute this procedure, you must have webMethods Integration Server administrator privileges. If you do not have administrator privileges, have your webMethods Integration Server administrator perform this procedure.

For this parameter… Specify…

Cluster Name Name of the cluster to which this Integration Server belongs.

If you specify a name that does not already exist, the caching software creates a new cache for this cluster.

If you specify the name of a cluster that already exists, the caching software adds this Integration Server to that cluster.

An enterprise can have more than one cluster. The cluster name allows the caching software to form separate caches for each cluster. Without a cluster name, all Integration Servers that share the same Discovery Address and Discovery Port would form a single cache.

Session Timeout Number of minutes an inactive session will be retained in the clustered session store. Default is 60.

Set this timeout value longer than the session timeout value, which governs how long a server keeps session information in its memory. (The server displays the session timeout on the Resources screen.)

Data Port Port that receives data from other members of the cluster. This number must be an integer. The default is 24547.


http://wiki.tangosol.com/display/COH32UG/Coherence+3.2+Home


5 Click Save Settings.

6 Verify that all nodes in the cluster are displayed under Cluster Hosts.

7 Restart the webMethods Integration Server.

Discovery Address IP address used to identify members of the cache that are to be used for clustering. This address must be in the range 224.0.0.0 ‐ 239.255.255.255. Note that many addresses in this range are reserved for use by certain organizations. Addresses in the range 224.0.0.0 through 224.0.0.255 are reserved for the use of various multicast‐related protocols. For a list of reserved multicast addresses, refer to http://www.iana.org/assignments/multicast‐addresses

The discovery address, in combination with a particular discovery port (described below), uniquely identifies a multicast group. For example, all cache members with the discovery address 224.0.36.1 and discovery port 24601 form one multicast group.

The caching software allows all the members of the same multicast group to access the current state of every cached element in the group.

When defining a cluster, it is recommended that you define one discovery address/discovery port combination (multicast group), per cluster.

Discovery Port Port used to identify members of the cache that are to be used for clustering. This number must be an integer. The default is 24601.

The discovery port, in combination with a particular discovery address (described above), uniquely identifies a multicast group. For example, all hosts with the discovery address 224.0.36.1 and discovery port 24601 form one multicast group.

Time to Live The maximum number of network segments that a multicast packet is allowed to traverse. Specify a number from 0 to 255. Set this number to that lowest value that works in your environment. The default is 1.

For this parameter… Specify…


http://www.iana.org/assignments/multicast-addresses


Scheduling Jobs to Run in the ClusterYou can schedule jobs to run on one, any, or all Integration Servers in the cluster. For jobs to run in the cluster, the server must be enabled for clustering and existing jobs must be flagged to run in the cluster. For instructions, see the chapter about managing services in the webMethods Integration Server Administrator’s Guide.

Using Guaranteed Delivery with ClusteringThe guaranteed delivery capabilities of the webMethods Integration Server ensure guaranteed one‐time execution of services. Guaranteed delivery ensures the following:

Requests from the client to execute services are delivered to the server.

Services are executed once, and only once.

Responses from the execution of services are delivered to the client.

With clustering, if the server on which the service is running becomes unavailable, the client retries the service on another server in the cluster. However, if the request fails against all servers in the cluster, you must use guaranteed delivery to guarantee execution. In addition, as in an unclustered environment, you must use Guaranteed Delivery to prevent a service from executing more than once.

For more information about guaranteed delivery, refer to the webMethods Integration Server Administrator’s Guide. For more information about developing services that use guaranteed delivery, refer to the Guaranteed Delivery Developer’s Guide.

When you use webMethods Integration Server clustering, guaranteed delivery stores information about requests for all the servers in the cluster in a shared, centrally located database. Because the information is stored centrally, guaranteed delivery uses a locking mechanism to synchronize updates to the database.

Update attempts to the database may time out. As a result, the server may require several attempts to complete a request. You can limit how long a cluster server sleeps between attempts to place an update lock on the database. In addition, you can set a maximum time limit after which the server overrides a lock on the database held by a non‐responsive server in order to complete the update.

1 Open the Integration Server Administrator if it is not already open.

2 In the Settings menu of the Navigation panel, click Extended.

The server displays a screen that lists configuration properties specified in the server.cnf file.

To configure cluster database locking



3 To set how long a cluster server waits between attempts to place an update lock, locate the watt.server.tx.cluster.lockTimeoutMillis parameter. If this parameter does not exist, add it. Set it as follows:

4 To set how long a cluster server will wait before overriding a lock, locate the watt.server.lockBreakSecs parameter. If this parameter does not exist, add it. Set it as follows:

5 Click Save Changes.

6 Restart the server to put your changes into effect.

For more information about using the Extended Settings facility to configure the Integration Server, see the webMethods Integration Server Administrator’s Guide.

Set this parameter: To specify…

watt.server.tx.cluster.lockTimeoutMillis The number of milliseconds a cluster server waits between requests to place an update lock.

Set this parameter: To specify…

watt.server.tx.cluster.lockBreakSecs The number of seconds a cluster server waits before overriding a lock.


3 Managing Server Clustering

Viewing the Servers in the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Removing a Server from the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Adding a Server Back into the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Taking a Server Offline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Bringing a Server Back Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26



Viewing the Servers in the ClusterWhen you have server clustering enabled, you can display a list of all the servers in the cluster. The server learns of other servers in the cluster by retrieving information from the distributed (or shared) cache.

1 Start the Integration Server Administrator. See the webMethods Integration Server Administrator’s Guide if you need help with this step.

2 In the Settings menu of the navigation area, click Cluster.

The server displays the list of servers in the cluster.

Removing a Server from the ClusterIf you no longer want a server to be part of a cluster, disable clustering on that server.

1 If you are using guaranteed delivery, you must take the server offline before removing it from the cluster. If you do not, the server will continue to receive transactions, but the transactions will not be integrated into the database and may be lost if the server is restored to the cluster. See “Taking a Server Offline” on page 26 for instructions on taking a server offline.

2 If it is not already running, start the Integration Server Administrator. See the webMethods Integration Server Administrator’s Guide if you need help with this step.



5 Click Disable Cluster to turn clustering off for this server.


7 Restart the server.

To view a list of all clustered servers

To remove a server from the cluster



Adding a Server Back into the ClusterYou can add a server back into a cluster by enabling clustering on the server again. When clustering was previously enabled, the server stored the configured settings in the server.cnf file in the IntegrationServer_directory\config directory and in some Oracle Coherence configuration files. If these settings have not been deleted or altered, when you re‐enable clustering, the server will have all previous clustering settings configured.

When you enable clustering, ensure that the server uses the same license key as the rest of the servers in the cluster.

1 If it is not already running, start the Integration Server Administrator of the server you want to add back into the cluster. See the webMethods Integration Server Administrator’s Guide if you need help with this step.



4 Click Enable Cluster to turn clustering on for this server.

5 Make any changes you want to the configuration. For more information about how to specify the configuration information, refer to “Configuring the Server to Use Clustering” on page 19.


7 Restart the server.

To add a server back into the cluster

Important! If you are using guaranteed delivery, the server you are returning to the cluster might be offline. When a server is offline, it can only be accessed through a single, unpublished port. Before you can use the server, you must first bring it back online. See “Bringing a Server Back Online” on page 26 for instructions.



Taking a Server OfflineThere may be times when you need to take a server offline. When you take a server offline, you are limiting access to it through a single, unpublished port.


2 In the Security menu of the navigation area, click Ports.

3 Click Change Primary Port.

4 In the Select New Primary Port area of the screen, select the port you want to make the primary port from the pull down menu. Select a port that does not already exist, for example 5556. If 5555 was your primary listening port, this step replaces port 5555 with 5556.

5 Click Update.

6 On your browser, update the URL for the Integration Server Administrator to use the new port.

7 Disable all other listening ports.

Bringing a Server Back Online There may be times when you need to bring a server back online. For example, you might have taken a server offline because you run guaranteed delivery and you needed to remove the server from the cluster. (When a server is offline, it can only be accessed through a single, unpublished port.) After you add the server back to the cluster, you must bring it back online.


2 In the Security menu of the navigation area, click Ports.

3 Click Change Primary Port.

4 Change the primary listening port back to the published primary port for the server.

Note: If you have guaranteed delivery, you must take a server offline before removing it from the cluster.

To take a server offline

To bring a server back online



5 Click Update.

6 On your browser, update the URL for the Integration Server Administrator to the new listening port you just defined.

7 Enable all disabled listening ports.


4 Building Checkpoint Restart into Flows

Checkpoint/Restart Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Sample Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31



Checkpoint/Restart ServiceswebMethods Integration Server provides a number of built‐in services (in the pub.storage folder) that you can use to make your flows more robust. With these services, you can write state information and other pertinent data to a short‐term store. If the webMethods Integration Server on which your flow service is executing becomes unavailable, when your flow service restarts it can check the state information in the short‐term store and begin processing at the point where the flow service was interrupted.

Note: In order to use the pub.storage services, you must define the ISInternal functional alias to point to an external database.

Important! These services are a tool for maintaining state information in the short‐term store. It is up to the developer of the flow service to make sure it keeps track of its state and correctly handles restarts.



Sample Flow The following sample flow service uses checkpoint restart services to keep track of state information and handle restarts accordingly.

Logic in the example flow



The following diagram explains the logic of the flow and shows where the various pub.storage services are used.


The pub.storage services are documented in the webMethods Integration Server Built‐In Services Reference.

When storing and retrieving the flow state in the short‐term store for checkpoint restart purposes, be sure the value of the key you use for the pub.storage:get and pub.storage:put services is unique to the transaction.

Check state/session information in short-term store using

pub.storage:get

Store state “0” in short-term store using pub.storage:put

Execute doSomething0 service

Store data and state “1” inshort-term store using


Store data and state “2” inshort-term store using


Store state “cleanup and exit” in short-term store using

Clean up and exit usingpub.storage:remove

Is state 0?

Is state null?

Is state 1?

Is state 2?

Yes

No

Yes

Yes

Yes

No

No



The sample shown below, which shows the pipeline in detail, uses the purchase order identifier (POID) of the document for the key.

Detailed pipeline example


A Short-Term Store Guidelines

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Usage Tips for the pub.storage Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36



IntroductionThis appendix contains information that will help you use the short‐term store safely and efficiently. It includes tips on using the pub.storage services and information about configuration parameters you can set for the short‐term store.

Usage Tips for the pub.storage ServicesThe pub.storage services are used for accessing the Integration Server short‐term store.

When using the pub.storage services, remember that the short‐term store is not intended to be used as a general‐purpose storage engine. Rather, it is primarily provided to support shared storage of application resources and transient data in a webMethods Integration Server clustered environment. Consequently, you are strongly encouraged against using the short‐term store to process high volumes of large data records or to permanently archive such records.

To maintain data integrity, the short‐term store uses locking to ensure that multiple threads do not modify the same entry at the same time. For insertions and removals, the short‐term store sets and releases the lock. For updates, the client must set and release the lock. Using locking improperly, that is, creating a lock but not releasing it, can cause deadlocks in the short‐term store.

The following guidelines will help you avoid short‐term store deadlock:

Release locks from the session in which they were set. In other words, you cannot set a lock in one session and release it in another. The safest way to do this is to release each lock in the flow service that acquired it.

Always pair a call to “get” or “lock” with a call to “put” or “unlock” so that every lock is followed by an unlock. In addition, use a Try‐Catch pattern in your flow service so that an exception does not prevent the flow service from continuing and releasing the lock.

Set limits on how long the threads will wait for a lock to be set or released. By setting finite limits, you allow the pub.storage package to release locks after a set amount of time and thereby avoid a deadlock situation.

The following paragraphs explain these guidelines in more detail.

The following example shows a simple flow service that pairs a lock with an unlock. If divideInts runs without error, the flow service completes and releases the lock. However, if an error occurs in divideInts, the unlock operation does not run. Further attempts to create a lock will block the requesting thread.




The next example shows the same flow service, but with a Try‐Catch pattern. Even if there is an error in divideInts, the flow service continues, falls through to the Catch portion, and releases the lock.


Although essential, these techniques are not sufficient for some situations. For example, an Integration Server or hardware crash might result in a prematurely terminated flow service, thereby leaving an outstanding lock on the pub.storage package. Or, a client flow service might hang while requesting a lock. In these situations, having limits on how long a lock can exist (lock duration) or how long a lock request will wait (lock wait) can prevent a pub.storage package deadlock.

By default, a lock can exist for 180000 milliseconds (3 minutes). After 3 minutes, the server forcibly releases the lock. You can change the lock duration by using the watt.server.storage.lock.maxDuration property from the Settings>Extended Settings screen on the Integration Server Administrator.

By default, a lock request will wait 240000 milliseconds (4 minutes) to obtain a lock. You can change this default by updating watt.server.storage.lock.maxWait property from the Settings>Extended Settings screen on the Integration Server Administrator. If a pub.storage



service specifies a lock wait, Integration Server uses this value instead of the value specified on the watt.server.storage.lock.maxWait property.

The pub.storage chapter of the webMethods Integration Server Built‐In Services Reference contains more information using locks in the short‐term store.


7-1-1 integration server clustering guide

Documents

clustering parameters

webmethods integration

server offline

webmethods adapters

setup overview

overview of clusteringwhat

webmethods trading networks

additional documentation