cmg2006 paper 6168

The Association of System Performance Professionals

The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing professionals committed to the measurement and management of computer systems. CMG members are primarily concerned with performance evaluation of existing systems to maximize performance (eg. response time, throughput, etc.) and with capacity management where planned enhancements to existing systems or the design of new systems are evaluated to find the necessary resources required to provide adequate performance at a reasonable cost.

This paper was originally published in the Proceedings of the Computer Measurement Group’s 2006 International Conference.

For more information on CMG please visit http://www.cmg.org

Copyright 2006 by The Computer Measurement Group, Inc. All Rights Reserved Published by The Computer Measurement Group, Inc., a non-profit Illinois membership corporation. Permission to reprint in whole or in any part may be granted for educational and scientific purposes upon written application to the Editor, CMG Headquarters, 151 Fries Mill Road, Suite 104, Turnersville, NJ 08012. Permission is hereby granted to CMG members to reproduce this publication in whole or in part solely for internal distribution with the member’s organization provided the copyright notice above is set forth in full text on the title page of each item reproduced. The ideas and concepts set forth in this publication are solely those of the respective authors, and not of CMG, and CMG does not endorse, guarantee or otherwise certify any such ideas or concepts in any application or usage. Printed in the United States of America.

http://www.cmg.org

VoIP Capacity Testing

By Garland Kan Member, Technical Staff Spirent Communications

Sunnyvale, CA 94089-1003

Migration from legacy POTS (plain old telephone system) systems to VoIP technologies has not always proceeded smoothly. The migration must consider many complex issues, including security, reliability, quality of service, and interoperability. Testing is critical to making such migrations smooth and effective. This paper shows issues encountered during migrations along with testing results. It focuses on tests that show the capabilities of a VoIP (voice over Internet protocol) system.

Introduction Deploying a new system into production is quite difficult. When it comes to voice, the burden of a reliable deployment is even greater—not only is voice critical to the operation of a company, users have come to expect top quality and reliability from years of using legacy POTS (plain old telephone system). Testing can help make new deployments safer and more effective. Without testing a system, its key metrics cannot be properly quantified and problems may occur during live usage. If a problem is found after the system was turned on and is in a production state, it becomes much more difficult to deal with it. For example, would you try to fix the problem in real time or switch the users off to another system and then troubleshoot it? At times, there may not be another fully redundant system to move all the users to, so the only option is to perform troubleshooting and testing while live users are on the system. One way to avoid this undesirable scenario is to perform testing before the system goes live. Most of the problems can be identified and flushed out during this time. By performing capacity testing on a softswitch or voice network prior to live usage, one can gain wealth of useful information. It provides hard numbers on the maximum rates the system can perform at and what type of voice quality the users will experience at those rates. With this information, the system can be tuned to provide better performance or voice quality prior to customers actually using the system. What users are accustomed to Users experience and therefore expect high voice quality and reliability not only from their current landline telephones but even cellular phones. Switching over to VoIP at the edge (VoIP phones) and/or in the core (VoIP transport) can lead to new problems that the users don’t’ often experience with their traditional phones. Problems such as packet

jitter, delay, and packet loss can impact voice even at low amounts. Jitter can cause voice gaps and packets may arrive too late to be assembled back into the voice stream. If the delay on the voice packet is too long, that packet will not be useful anymore: since too long a period of time will have passed, that portion of the voice stream will not be played to the listener. This will result in a missing section of the voice stream, which may potentially affect quality and intelligibility of the conversation. Jitter has the same effect--if there is a large enough gap between the arrival of a packet, usefulness of that packet will have expired, and it will not be played into the voice stream. Without capacity testing, these problems might not be noticed until a large amount of users are on the network. This is why testing a softswitch or a voice network prior to live usage is important. Problems users and service providers might face Most users don’t care about the underlining technology they are using for an activity they want to perform, instead gravitating to the most effective and most affordable method. For telephony, VoIP is currently one of the cheaper methods to make phone calls while also offering some features that other phone services currently do not. Users who switch to VoIP through services such as Vonage or Skype-Out expect decent voice quality since they are paying for this service. With a POTS phone (a.k.a. landline), the call is on a dedicated line versus a VoIP call, which uses a public network and has to share bandwidth with others. Moving to this type of VoIP service puts the user’s phone calls on the Internet, with no guarantee that it will work or have good voice quality. Going through the Internet for a VoIP call is not the best solution. At times the voice stream can experience packet loss, high jitter time, or even port blocking from unkind service providers. For a user to

get the best possible experience one would usually have to be on the service provider’s network offering the VoIP service. When service providers offer VoIP service, they have control of how data packets flow through their network. QoS (quality of service) can then be applied to the voice streams right from the user’s home, which will give priority to those packets over others such as web or mail traffic. This ensures that the phone call will have a certain amount of bandwidth even when it is sharing the same pipe with other traffic. Voice networks A softswitch (or voice network) deployment is a complex project in many ways. The network may or may not need to handle other traffic simultaneously. Quality of service on the network for voice call setup and media must be configured. Security needs to be in place to protect users and core telephony servers. With all the different pieces in a voice network, everything has to be configured correctly and work together to provide high voice quality and reliability. Comprehensive capacity testing ensures that the entire network runs harmoniously after VoIP traffic has been added. New network technologies are being deployed to accommodate various types of traffic on the same medium. Technologies like MPLS (multiprotocol label switching) and traffic engineering can provide the necessary overlay for the current network that will allow voice to travel through the network efficiently and in a timely manner. With these new technologies and methods, service providers have more control on how data will traverse the network. If done correctly, it gives users a better experience when their VoIP phone calls traverse that network. What is capacity testing? Capacity testing allows you to load the network up to a point where different mechanisms ensuring voice streams quality have to take effect and perform some action. This type of testing means saturating the voice network or the softswitch with a large amount of calls to initiate these mechanisms and, if necessary, perform some corrective actions. At low rates these mechanisms may be in use, but not noticeable. Counters and logs on various devices can show that different layer 2 and layer 3 mechanisms of the OSI reference model are being utilized. The one thing it doesn’t show is the effectiveness of these mechanisms, when the system is fully utilized. This type of testing is usually performed when there is no other traffic besides the test traffic on the system. With other traffic on the system, such tests cannot provide an accurate prediction, because it usually

does not take into account traffic that was not generated by the test itself. What kind of tests to perform and why? What kind of voice quality are the users getting while at the maximum rates? Can the network or softswitch handle the designed load? If it cannot handle the designed load, the problem can be narrowed down to pinpoint which device in the network is the bottleneck or configured incorrectly. It can also show if the logical layer 2 and layer 3 services are working correctly and performing as expected. These questions can be answered with the following tests. Calls Per Second This is a performance test to show how many calls a network or softswitch can process in one second. When setting up a call, at the minimum a few things need to happen. The call processing engine (this can be a softswitch a.k.a. call manager) needs to look up the call and authorize it. Then it has to keep records for the state of this call. This can be a database entry or it can be as simple as a log file. If the provider wants to be paid for this call there has to be a CDR (call detailed record) for this transaction, which may be as simple as using the log file in the state table or inserting a record into a database that is keeping track of calls to bill. With all these activities going on for just a single call, it can be quite overwhelming for a system to try and process the load for a large amount of calls. Performance limitations of different parts of the system will start to show up at this point. For example, the database might not be able to handle more than X number of transaction per second or memory consumption and CPU rates start to rise as more and more calls are initiated. Softswitch manufacturers usually provide engineering specifications on how many calls per second their system can perform. This figure can come from either theoretical numbers of what each piece of their system can do and then combining it to produce a projected CPS (calls per second) number or by actually performing a CPS test. determining the method used to come up with their CPS figure requires running a test or going through their data to confirm how they derived that number. Besides, in most instances your deployment of their equipment is probably utilizing best of the breed solution, which often will not match what the equipment manufacturer has or used. Different equipment will cause the limitations and performance of certain parts of the network to either increase or decrease. CPS Test Setup

Setting up a CPS test can happen in two ways. The first is to utilize just enough endpoints to achieve the desired CPS rate. The second is to utilize the maximum number of endpoints the system is capable of to achieve the desired CPS rate. Without knowing the internals of the softswitch or network functions, the second method is preferable, as it stresses more parts of the system than the first method. Concurrent Call Test The concurrent call test brings up the maximum number of calls that the softswitch or network can handle. Since this test is not trying to stress CPS rates, the ramp up speed of the calls will be slow. This test stresses the network differently than the CPS test. It utilizes more of the network resources such as bandwidth and quality of service (QoS) mechanisms than call processing activities such as CPU cycles on the softswitch, database queries, state table updates, or CDRs for the call. This test shows how many concurrent calls the system can handle and the voice quality when the maximum number of concurrent calls are active on the system/network. Although there are many different voice quality measurement standards, the most used currently is the ITU-T P.862 PESQ (perceptual evaluation of speech quality) algorithm. This method utilizes an industry set standard for voice quality measurements, providing a number between -0.5 to 4.5 with 4.5 being the highest score possible. Usually scores over 4.0 are considered good voice quality. What type of metrics and results will come out of the testing? The results from these tests will provide wealth of information, but the most important ones are described below. This information helps an engineer and/or management make decisions about the system. The CPS test gives the maximum rate at which the softswitch or network can connect calls. Using that number, one can plan an upgrade path for the voice network. Or If the CPS rate is not close to what the design requirement states, this test can be used to find possible configuration problems or network devices that cannot process calls at that load, or help search for alternative solutions that hit the required performance. Figure 1 shows a CPS test detailing the amount of originating (green line) and terminating calls (blue line). The Y axis is the number of channels and the X axis is the time. This test ran near the limits of a particular softswitch. As you can see early on during the test, with less than 800 calls, the blue and green lines overlap. This means that each originating call

was setup in a short amount of time. When the number of calls exceeded 800, the blue and green lines began to diverge from each other. This means that the softswitch is taking longer to connect the call. As the test continues, the gap between the blue and green lines starts to widen until no more new endpoints are being started. If more endpoints start and the gap widens even more, the new calls will most likely fail.

Figure 1 – CPS Graph The concurrent call test shows what the softswitch or network can do when fully-loaded with the maximum number of calls possible. Under normal situations, this type of event rarely happens during live usage. There can be times when it spikes up to that level, but in normal operation it does not stay at that load. This test gives insight to the network’s capabilities and the voice quality at this load. Figure 2 shows a concurrent call test run. It represents a well-behaved voice network that is able to establish the maximum number of calls, which is just a little over 1500. This figure only shows the maximum number of calls that can be setup, but it does not show the voice quality during the maximum load.

Figure 2 – Concurrent Call Graph

Figure 3 shows 60 seconds worth of the PESQ scores for the concurrent call test starting at 20 minutes (1200 seconds) into the test. During the full load, all the PESQ scores stay above 4.39, which are good scores. Users on a conversation that goes through this system at that time would not complain about voice quality.

4.364.374.384.39

4.44.414.424.434.444.454.464.47

1200

1202

1204

1206

1208

1210

1212

1214

1216

1218

1220

1222

1224

1226

1228

1230

Time (seconds)

PESQ

Val

ues

Figure 3 – PESQ Over Time Call setup time is another important metric because users expect a call to establish relatively quickly. With landlines, it usually takes only a couple of seconds before a ring comes back; cell phones take a little longer. Depending on how the users use a VoIP connection, they may have different expectations. Users making a call from their home with an ATA (analog telephone adaptor) expect a landline type of response. If they use a soft-phone from a computer or a PDA (personal digital assistant), they are more likely to accept longer call setup times. The second type of users are usually the early adopters familiar with new technologies and will tolerate some inconveniences such as longer call setup times and maybe even lower voice quality. Test Environment – Hardware and Software The test environment should encompass all the endpoints that the softswitch/voice network will encounter in production state. In a service provider’s network this might mean various VoIP protocols (SIP, MGCP, H.248/MEGACO) and different PSTN gateways (T1/E1, SS7) traversing the two different types of voice networks. For a good network wide test, the test equipment should connect into various endpoints to establish calls. These endpoints are usually at the edge of the provider’s network where their responsibility ends and where the handoff to the next service provider starts. Connecting and testing from these locations ensures that the entire network is tested. The locations to connect the test equipment to will vary from network to network, but usually the same methodology applies.

More specific tests, like testing a softswitch or trying to narrow down a bottleneck in the network, will connect into specific devices or areas of the network to stress test them for performance and voice quality. This will reduce the external variabilities at play during the test to help diagnose the problem or get measurements for that portion of the network. Figure 4 below shows possible test points for a softswitch or voice network test. The test device (Abacus 5000) connects to an SS7 link, the IP network emulating various VoIP protocols, and the PSTN network. By testing from many different points on the network and utilizing different protocols and technology, the test stresses many devices to find their performance and quality levels.

Figure 4 – Network Diagram The test equipment used for this type of testing should emulate a real world endpoint as close as possible and utilize as many voice protocols as possible to simulate a real voice environment. This means that the endpoints generating the calls should mimic a real soft-phone. The software driving this test equipment should also facilitate detailed testing by gathering and calculating detailed statistics on all phone calls generated by the test system. This allows full monitoring of every call’s performance to obtain accurate statistics on the system’s performance. Conclusion Performing capacity tests on a softswitch or network for voice deployment is critical. It ensures that everything is configured correctly or that the current configuration supports a certain voice quality and call capacity. Testing of this nature allows one to perform the due diligence this kind of deployment requires. When tests described in this paper are performed on a softswitch or network, it can provide a lot of information on how voice traffic will be handled. This information can be used to properly design and tune

the system from the start, preventing surprises and unhappy users,

cmg2006 paper 6168

Business