zero copy web server

9
LCCWS: Lightweight Copyfree Cross-layer Web Server Haipeng Qu Department of Computer Science, Ocean University of China, Qingdao, China Email: [email protected] Lili Wen, Yanfei Xu and Ning Wang Department of Computer Science, Ocean University of China, Qingdao, China Email: [email protected], [email protected],[email protected] AbstractFor the purpose of improving the performance of web server, this paper implements a high-performance web server prototype system, which is named LCCWS. Adopting PF_RING technology, which is similar to zero-copy technology, this system achieved to copy data between network interface device and kernel ring buffer in DMA mode and access data between application program and kernel ring buffer in MMAP way, so that the CPU participation and memory copies are reduced, saving much CPU overhead. When data packets splitting and encapsulating, using the lightweight TCP/IP protocol suite, the improved web server passed up the packets directly from the data-link layer to application layer, so that the time of copies is reduced and the packet processing is accelerated. LCCWS reduces the CPU overhead effectively, decreases the transferred data copying between memories, and improves transferred efficiency, laying foundation for further research to improve strong practical, feature-rich and high-performance web server. Index Terms—PF_RING, zero-copy, cross-layer design, Web Server I. INTRODUCTION With the development of WWW (World Wide Web), the scope and number of the dissemination of information on the Internet, including the vast majority of web services and information, is growing exponentially [1] As the key node of network information processing and promulgating, Web Server (Web server, also known as WWW server) needs to carry more traffic load than ever before. This requires the web server to have a higher packet processing rate and lower transfer delay. However, there are performance bottlenecks in CPU, memory, network bandwidth, storage, and applications and so on for traditional web server. There are two main factors affect the performance: firstly, the traditional working method of network cards relies on capturing packets from the NIC(Network Interface Card), and copying them to the upper layers, which both packet capturing and copying consumes much resources of CPU, resulting in a drop in overall performance; secondly, the packet content copied multiple times when being split and encapsulated for the traditional protocol suite, which is not only complex and cumbersome, but also takes significantly much time and occupies large amount of memory space. The current methods of solving server performance bottlenecks [2] are to add network bandwidth, expand the memory capacity, use the SCSI (Small Computer System Interface) or RAID (Redundant Array of Inexpensive Disk) array storage disk, use multi-processor, increase the cache, optimize disk I/O, etc. Nevertheless, those methods above are not the fundamental solutions to the web server performance bottlenecks but cost high. Speeding up the capture rate and reducing the transfer data copy between the memories, can effectively improve the efficiency of communication, thereby improving the performance of web server. Copyfree idea is also refer to zero-copy. Zero-copy technology [3] reduces the operating system and the protocol overhead of data transfer and increases bit rate of communication, then makes high-speed communications, by reducing or eliminating the operations of critical communication routines that affect the communication rate. However, there is no existing uniform standard for the zero-copy technology, and modifications to the network card driver are required [4], which affects the normal use of the most of web servers, and it has poor portability and generality. In this paper, CPU processing overhead is reduced and data packet capture performance is improved, based on zero-copy technology by loading PF_RING module in the Linux kernel, capturing packets using the NAPI based on device polling, storing packets in the ring buffer through the DMA (Direct Memory Access) mode; upper layers access the ring cache directly in MMAP (memory- mapped) way, reduces the time of copying data in kernel space, and saves the system time and space resources [5]; PF_RING is an independent modules without having to modify the network card driver, therefore it improves the generality and portability. For packets splitting and encapsulating, the traditional system protocol stack is improved in our paper, and a simplified protocol stack, which is only with the necessary functions such as packing, unpacking, checking of the packet header, fixed address communications, reduces copies between each protocol layer, passes up packets directly from the data- link layer to the application layer to speed up the packet JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013 165 © 2013 ACADEMY PUBLISHER doi: 10.4304/jnw.8.1.165-173

Upload: smjain

Post on 28-Dec-2015

19 views

Category:

Documents


2 download

DESCRIPTION

zero copy web server

TRANSCRIPT

LCCWS: Lightweight Copyfree Cross-layer Web Server

Haipeng Qu

Department of Computer Science, Ocean University of China, Qingdao, China Email: [email protected]

Lili Wen, Yanfei Xu and Ning Wang

Department of Computer Science, Ocean University of China, Qingdao, China Email: [email protected], [email protected],[email protected]

Abstract—For the purpose of improving the performance of web server, this paper implements a high-performance web server prototype system, which is named LCCWS. Adopting PF_RING technology, which is similar to zero-copy technology, this system achieved to copy data between network interface device and kernel ring buffer in DMA mode and access data between application program and kernel ring buffer in MMAP way, so that the CPU participation and memory copies are reduced, saving much CPU overhead. When data packets splitting and encapsulating, using the lightweight TCP/IP protocol suite, the improved web server passed up the packets directly from the data-link layer to application layer, so that the time of copies is reduced and the packet processing is accelerated. LCCWS reduces the CPU overhead effectively, decreases the transferred data copying between memories, and improves transferred efficiency, laying foundation for further research to improve strong practical, feature-rich and high-performance web server. Index Terms—PF_RING, zero-copy, cross-layer design, Web Server

I. INTRODUCTION

With the development of WWW (World Wide Web), the scope and number of the dissemination of information on the Internet, including the vast majority of web services and information, is growing exponentially [1] As the key node of network information processing and promulgating, Web Server (Web server, also known as WWW server) needs to carry more traffic load than ever before. This requires the web server to have a higher packet processing rate and lower transfer delay. However, there are performance bottlenecks in CPU, memory, network bandwidth, storage, and applications and so on for traditional web server. There are two main factors affect the performance: firstly, the traditional working method of network cards relies on capturing packets from the NIC(Network Interface Card), and copying them to the upper layers, which both packet capturing and copying consumes much resources of CPU, resulting in a drop in overall performance; secondly, the packet content copied multiple times when being split and encapsulated for the traditional protocol suite, which is not only

complex and cumbersome, but also takes significantly much time and occupies large amount of memory space.

The current methods of solving server performance bottlenecks [2] are to add network bandwidth, expand the memory capacity, use the SCSI (Small Computer System Interface) or RAID (Redundant Array of Inexpensive Disk) array storage disk, use multi-processor, increase the cache, optimize disk I/O, etc. Nevertheless, those methods above are not the fundamental solutions to the web server performance bottlenecks but cost high.

Speeding up the capture rate and reducing the transfer data copy between the memories, can effectively improve the efficiency of communication, thereby improving the performance of web server. Copyfree idea is also refer to zero-copy. Zero-copy technology [3] reduces the operating system and the protocol overhead of data transfer and increases bit rate of communication, then makes high-speed communications, by reducing or eliminating the operations of critical communication routines that affect the communication rate. However, there is no existing uniform standard for the zero-copy technology, and modifications to the network card driver are required [4], which affects the normal use of the most of web servers, and it has poor portability and generality. In this paper, CPU processing overhead is reduced and data packet capture performance is improved, based on zero-copy technology by loading PF_RING module in the Linux kernel, capturing packets using the NAPI based on device polling, storing packets in the ring buffer through the DMA (Direct Memory Access) mode; upper layers access the ring cache directly in MMAP (memory-mapped) way, reduces the time of copying data in kernel space, and saves the system time and space resources [5]; PF_RING is an independent modules without having to modify the network card driver, therefore it improves the generality and portability. For packets splitting and encapsulating, the traditional system protocol stack is improved in our paper, and a simplified protocol stack, which is only with the necessary functions such as packing, unpacking, checking of the packet header, fixed address communications, reduces copies between each protocol layer, passes up packets directly from the data-link layer to the application layer to speed up the packet

JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013 165

© 2013 ACADEMY PUBLISHERdoi: 10.4304/jnw.8.1.165-173

analyzing. Based on the above two techniques, a lightweight high-performance web server prototype system LCCWS is derived from a small open source Web server Mattows [6].

II. RELATED WORK

This paper presents several technologies to solve the web server performance bottlenecks: PF_RING technology based on zero-copy [7], improved TCP/IP protocol suite, the mechanism of static web cache and multi-queue priority control mechanism.

A. PF_RING Technology based on Zero-copy Zero-copy [8] is a fast network packet processing

mechanism [9], without any memory copy in the packet separation and encapsulation process, and only when packet is sent out by the NIC driver or read by applications will the data be copied. Zero-copy technology uses DMA and memory mapping.

PF_RING [10] technology is using the Device Polling, also known as NAPI in Linux, for NIC interrupt mode, then the captured packets will be copied to the circular queue in the kernel buffer through DMA. Interface of reading data packets is implemented by MMAP. If the DMA transfer interrupts during Device polling, the packets reach the DMA buffer through DMA [11] channels, and then the network card interrupt mode can be shut down. All the subsequent data should be received in polling mode. Once the kernel polls the device, network cards will generate an interrupt. If no packet is waiting for being received or some of packets have been received while polling, network card interrupt mode will be open.

PF_RING provides a modified version of network access interface, and the improved network access interface uses the PF_RING protocol suite instead of the PF_PACKET protocol suite to create a socket and receive packets to be built on top of the PF_RING network access interface. Working process is shown in Fig. 1 [5].

B. Improved TCP/IP Protocol Suite TCP/IP is a protocol suite [12], which is usually

divided into four layers, corresponding to the OSI reference model of the transport layer, internet layer, data-link layer and physics layer. Each layer implements a set of protocols. Its core is the transport layer protocol (TCP and UDP) internet layer protocol (IP) and physical interface layer, and the protocol implementations of these three layers usually are completed in the operating system kernel. The data-link layer is the basis of the TCP/IP. The IP layer provides transport service to the transport layer (TCP and UDP). The IP layer’s main functions are addressing, routing, segmentation and reload. TCP provides functions of dividing/merging, joining/splitting, multiplexing/demultiplexing, confluence/diversion, and retransmission, etc.

Traditional TCP/IP protocols mainly focus much attention on ensuring the reliability of data transmission and dataflow control, however its real-time performance is bad, and implementation is complex. It takes up a lot of system resources [13]. The improved TCP/IP protocol suite, on the premise that it is not contrary to the protocol standards, reduces the gap between service requirements of high-level (session layer) and services of the low-level network protocol stack can provide, and its features only include such as the header encapsulation and splitting, the header checking, and message acknowledgement of the data-link layer, transport layer and internet layer.

During data packets is splitting and encapsulating, the improved web server passes up the data directly from data-link layer to application layer, reducing the times of copy of the data packets between the other layer protocols. It is the pointer of data that passed between the layers of protocol suite, and only when the data is sent out by the driver or read by the application program will the data really be copied. The code size and essential memory space are smaller than the general-purpose TCP/IP protocol suite to speed up packets processing.

Figure 1. PF_RING Operating Principle.

166 JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013

© 2013 ACADEMY PUBLISHER

There are a variety of TCP/IP gateways programming interfaces, and currently the most popular one is the socket programming interface. Socket is a kind of socket specification built in the transport layer protocol [12]

TABLE I. SOCKET STRUCTURE BODY DESCRIPTION

. In our paper, using the network access interface that PF_RING provides, we improve TCP / IP protocol stack, then encapsulate the improved TCP / IP protocol stack, and rewrite Web server socket communication functions.

Socket Structure Body Field Contents

addrs

source MAC, destination MAC ,source IP ,destination IP, source port and destination port information

sendc serial number of data

recvc

bytes sent serial number of data bytes

windows

expected Send window size

windowr Receive window size

packet[SOCKET_MTU+14] Storage of receiving / sending packets

TTL Time to live MSS Maximum Segment Size

family Protocol family

type Type identification of SOCK_DGRAM, SOCK_RAW, etc.

protocol Protocol type: TCP,UDP etc errbuf[PCAP_ERRBUF_SIZE] the error messages

status Socket status identification including listening, connected, default, etc.

*device NIC device descriptor

*fp Capturing instance descriptor to open socket

filter Filter structure

C. The Mechanism of Static Web Cache The mechanism of static web cache has been proved to

be extremely effective and it is one of the widely used techniques for web browsing acceleration. It reduces the waste of network bandwidth by reducing duplicated transmission on the network of the requested data. And also it decreases computing of application server to lighten the load of web server and application server by duplicated computing to dynamic data, ultimately to achieve purpose of shortening waiting time of user, improving processing efficiency of current system.

Frequently accessed web pages are stored in cache servers in the form of static pages whose index is stored in web server. When web server receives the request from client, it will first check whether indexes contain static page the request needs. If yes, the static page is fetched from cache server and is sent to the client. Otherwise, a new static page will be generated, and it will be added to the cache server and the web server cache with its indexes according to cache replacement algorithms. The web cache system operating principle is shown in Fig. 2

D. Multi-queue Priority Control Mechanism Even though the server implementation is fast enough,

when faced up with the large number of network requests, the processing capability of the server may still be in inefficient state, which is due to the large number of the requests exceeding the maximum number of the requests the server can deal with. In order to ensure the service capabilities of the server under the extremely terrible environment, we propose the multi-queue priority control mechanism.

We separate the requests into three queues: black queue, white queue and gray queue, the priority order is as follows: white>gray>black. All new requests from the client enter gray queue by default. If a certain request’s frequency exceeds a specified threshold for some time, then the request enters the white queue and its request is processed preferentially. When the request is identified as dangerous request or untrusted request by server intrusion detection mechanism, then the request enters the black queue.

When the server handles the requests of the client, requests in the white queue will be processed by the server preferentially. Not until the white queue is empty, the server processes requests in the gray queue. When the white queue and gray queue are both empty, the server turns to process the requests in the black queue.

III. LCCWS SYSTEM DESIGN

The lightweight web servers usually have features such as simple functions and high speed. Generally it is open source, and easy for analyzing and improving. This paper develops a high-performance web server prototype - LCCWS, which is exactly

improved on the base of an open source web server Mattows.

Figure 2. Web cache system operating principle

A. The Overall Design of LCCWS LCCWS is a lightweight Web server running on the

Linux, supporting CGI (Common Gateway Interface). At the kernel of the operating system, PF_RING with cache

JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013 167

© 2013 ACADEMY PUBLISHER

loaded as a module combines with NAPI to reduce the interrupt response frequency of NIC. In user space, applications access the network packets in the ring buffer by MMAP. The splitting and encapsulation of the packets uses a simplified TCP/IP protocol suite. It is data pointers that passed between the layers of protocol stack, and thus the copy times between each layer are reduced. The design of the system is shown in Fig. 3. In order to be faster, we also adopt the mechanism of static web cache and multi-queue priority control mechanism, its working process is shown in Fig. 5.

B. LCCWS System Implementation The implementation of LCCWS mainly includes four

modules: loading PF_RING module into the Linux kernel implementing simplified TCP/IP protocol suite, Web caching mechanism and multi-queue priority control mechanism.

PF_RING module load: PF_RING is running in the kernel space as a module. It can be dynamically loaded and unloaded, which becomes a part of kernel after being loaded. Loading process is divided into three steps: the kernel upgrade, configuration loading and test [10]. First, we can check the version of the kernel through the "uname" command; download the source code and kernel patch according to version, and upgrade the kernel by the command: "zcat linux-2.6.25-1-686-smp-PF_RING. patch.gz | patch -p0"; then, we compile the kernel configuration, load PF_RING module and configure it to support devices polling mechanism; finally, we test the operation results of loading process by “cat info” command. If it is successfully loaded, PF_RING information as Fig. 4 will be shown.

Lightweight TCP/IP protocol suite: LCCWS improves the TCP/IP protocol suite. At the same time, it wraps the improved TCP/IP protocol suite using the network access interface provided by PF_RING. It also rewrite socket communication functions: fss_socket (), fss_bind(), fss_listen(), fss_accept(), fss_connect(),

fss_send(), fss_recv(), fss_close() etc. The implementation in detail is as follows:

Figure 3. PF_RING information

In data-link layer, data frames are built and split. The IP protocol and TCP protocol are simplified in the internet layer and in the transport layer separately. Each layer is described as follows: the data-link layer realizes the functions like building and splitting the link layer frame; IP layer protocol does not realize the function of routing, but tests the IP address and check sums of the packets received, what’s more, it adds IP headers for packets which are waiting to be sent off etc. The TCP protocol is to ensure that the data transmission is correct and orderly. It uses the technologies such as checksums, ACK and sequence number. In the design process of protocol stack, we shall try to reduce the gap between service requirements from upper-layer protocol stack (application layer) and the service that lower-level network can provide, such as when we package packets, the maximum of TCP layer packets, the maximum of IP packets and the maximum of the link layer frames are set to be the same value, which avoids the trouble when TCP layer’s packets divide/merge and IP layer’s data recombine /split [14]. Simplified protocol suite structure is shown below in Fig. 6.

Figure 4. LCCWS system Design

168 JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013

© 2013 ACADEMY PUBLISHER

Figure 5. Simplified TCP/IP stack

Using the network access interface provided by PF_RING, we rewrite web socket communication functions and redefine socket structure through packaging the improved TCP/IP protocol suite. Socket structure includes addrs (storing communication information: source MAC, destination MAC, source IP, destination IP, source port and destination port), sendc(storing the serial number of the sent bytes), recvc (storing the serial number of the byte that wish to received), windows (size of send window), windowr (size of receive window), packet[SOCKET_MTU+14](storing packets received and sent), TTL(time to live), MSS(maximum segment size), family(protocol family), type(marking SOCK_DGRAM, SOCK_RAW type etc), protocol (protocol type TCP,

UDP), errbuf[PCAP_ERRBUF_SIZE](Storing error messages), status(marking the socket’s current status: listening, connected, default, etc.), *device(the network card descriptor), *fp(the capture instance descriptor of opening this socket), filter(filter structure), etc [14].

Web caching mechanism and cache replacement algorithms Static content caching improves the capacity and response speed of web server handling client requests. Caching mechanism is concretely realized as follows: Static content index is cached in web server. Every index is a data block and every data block contains address information of the requested content, the attribute of access time and access frequency. The cache server is

JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013 169

© 2013 ACADEMY PUBLISHER

used to store content of static page pointed by data block which is indexed by web server.

Figure 6. Simplified TCP/IP stack

Web server pre-checks user requirement every time. When this requirement hits index buffer, static page in cache server pointed by data block is immediately returned to user. But when the request misses, according to cache replacement algorithms, the resource this request pointing to will generate static content, then it is added to cache server and the index pointed to the static content is added to web server cache.

This cache replacement algorithm is achieved as follows:

When the buffer is full, demarcate indexes in web server according to the attribute of access frequency. That is, partition requests whose access frequency is greater than I to domain High and those access frequency lower than I to domain Low. The value of I is that the ratio of domain High and domain Low is 2:1. That is, the number of index blocks in domain High accounted for 2/3 of its total number.

Sort indexes in Low by the attribute of access time. Swap the location of least recently accessed index block

and current block. And then swap synchronously static content in cache server.

In order to guarantee cache data to be in accordance with real request cache, the algorithm adopted is like this. Each data block object (content) in web cache is assigned a valid time TTL (Time To Live). When TTL is over, the object (content) contains this TTL is invalid. Next request to this object (content) will cause web server to flush the cache.

Implementation of multi-queue priority control mechanism Multi- queue priority control mechanism can guarantee that, when faced up with a large number of instantly user requests, In order to ensure the capability of the service, the web server can be able to make the appropriate response. Encapsulate each request, and set the last request time, the number of the request, credible or not, reference count and other attributes.

The implementation of its priority queue is as follows: Initialize three request queues, as the black queue, gray

queue, white queue respectively, its processing priority order: white>gray>black.

For each user request, firstly the server checks whether it exists in the current queues or not. If it exists, then the reference count of the corresponding request object is incremented by 1.

If the current request is not in the respective queue, detect whether the request is secured (trusted).

If it is a trusted request, check historical request of the request frequency for a period of time. If the number of the request is greater than K times specified by the system, the request will be added to white queue; otherwise, it will still in the gray queue.

If the request is untrusted, the request will join the black queue.

When the server handles the requests, requests in the white queue will be processed by the server preferentially. Not until the white queue is empty, does the server process requests in the gray queue. When the white queue, and gray queue are both empty, the server turns to process requests in the black queue. The multi-queue system work flow is as shown in Fig. 7.

Figure 7. Multi-Queue System Work Flow

170 JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013

© 2013 ACADEMY PUBLISHER

IV. PERFORMANCE ANALYSIS

User data should to be split and encapsulated many times during being transmitted from local host to the remote host. Data copies are passed for data transmission between every protocol layer. This process increases the system cost, and reduces the system performance. LCCWS performance improvement mainly includes the following respects:

• Using the NAPI mechanism based on devices polling improves the speed of capturing data packets [5]. In big network traffic environment, the standard network card interrupts combining with layer-by-layer data copies and system calls will take up vast of CPU resources, while resources actually dealing with the data are very little. For big network traffic environment, once a DMA transmission interrupt is found, the first thing is to close NIC interrupt mode. All the subsequent data should be received in polling mode. Once the kernel polls the device, the network card generates an interruption, which would greatly reduce the interruption times of the network card while polling. If no packet is waiting to be received or some of packets have been received, then the NIC interrupt mode can be open. This means accelerate capturing and processing of packets greatly.

• Data copying in DMA mode does not require CPU to participate in and reduces the CPU overhead.

• User layer adopts the MMAP way to access the cache of the network interface directly, avoids copies between the memories, shortens the path that packets need to move and saves the CPU overhead.

• When data packets splitting and encapsulating, the improved web server passes up the packets directly from the data-link layer to application layer, reducing the times of copies of the data packets of the middle layer protocols. It is data pointer that passed between the layers of protocol suite, and only when the data is sent out by the driver or read by the application program will the data really move.

• Under the multi-queue priority control mechanism, the requests are put into white, gray, black three priority, web server will decide priority to the request of higher priority, so that the lower priority requests will occupy less server resources and the response time of web server in the overall will be shorten. Thus, the service performance of the server will increase under the situation that network access requests exceeds server’s capability

• Under the static page cache mechanism, the frequently visited pages are stored in the cache of the server in static page, and the page indexes are saved in web server. If the client requests exist in the index package, the corresponding static page will be extracted from the cache server to return to

the client. This mechanism shortens the response time of the server, and increases the amount of concurrent access and system processing efficiency of the system.

LCCWS reduces the CPU overhead effectively, decreases the communication data copying between memories, and improves communication efficiency effectively. Consequently the performance of the web servers is greatly enhanced. The comparison of traditional access mode and DMA mode is as shown in Fig. 8, and the comparison of page not cached and page cached is as shown in Fig. 9

Figure 8. Comparison of Traditional Access Mode and DMA Mode

Figure 9. Comparison of page not paged and page cached

V. SIMULATION EXPERIMENT

Because of resource constraints, there’s no computer supporting DMA mode. Its performance cannot compare with the Tomcat server’s, so we carried out a simulation experiment.

The simulation experiment used the sending side of the Distributed DoS Pressure Measurement System to generate data traffic. The experiment is divided into four parts: firstly, we compared the ordinary libpcap packet capture rate with that installed the PF_RING kernel

JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013 171

© 2013 ACADEMY PUBLISHER

modules. The libpcap with PF_RING modules uses the NAPI mechanisms and MMAP mechanism, and its packet size is between 512 bytes to 1500 bytes. Packet capture rate raised up 19.86%. After, LCCWS system reduced once data copy and the packet size keeps between 512 bytes and 1500 bytes. Compared with the original system, the response time was improved by 2.35%.Then, static page cache mechanism enabled LCCWS system and a certain number of pages index was established. For the same set of network access requests, compared with the original system without the mechanism, the average response time improved by 21.96%. Finally, the LCCWS system enabled multi-queue priority control mechanism and improved each queue. For the same group network access requests, which the number of exceeded the upper limit of the server’s capability, compared its corresponding rate with the system without the mechanism, the average response time was up 10.73%.

From the experiments above we can see, loading the PF_RING module and combining the NAPI and a MMAP mechanism, packet capture performance is improved; Reducing data copying between the protocol stack and the upper application and adopting static page cache mechanism and multi-queue priority control mechanism, also improves the web server response speed. By using aforementioned four measures comprehensively the performance of the improved web server LCCWS will be improved.

VI. CONCLUSION

The LCCWS prototype system designed in this paper improves the web server performance bottleneck through the zero-copy technologies, optimization of the TCP/IP protocol stack etc. This prototype system provides research foundation for the development of rapid and efficient web server system. But the protocol suite of the system still needs to improve, some functions such as packets retransmission, flow and congestion control also need further study.

The later work is mainly to improve the lightweight TCP/IP protocol stack and Socket communication function to implement the strongly practical, feature-rich, fast and efficient web server prototype system step by step.

ACKNOWLEDGMENT

This work was supported in part by a grant from the National Natural Science Funds No.60970129.

REFERENCES

[1] N. Yao, M. Zheng and J. Ju, "A High Performance Web Server based on Pipeline," Journal of Software, vol. 14, no. 6, pp. 1127-1133, 2003.

[2] "How to Improve Web Servers Performance," [Online]. Available: http://www.macrounion.com. [Accessed 12 11 2012].

[3] X. Ke, Z. Gong and J. Xia, "Research of the Zero-copy

Technique and Its Implementation," Computer Engineering & Science, vol. 22, no. 5, pp. 17-20,24, 2000.

[4] L. Wang, M. Wang and X. Wang, "Research and Implementation on Linux Paekets Capturing of Gigabit Network," Jinan, 2007.

[5] "Luca Deri.Improving Passive Packet Capture:Beyond Device Polling," [Online]. Available: http://luca.ntop.org/Ring.pdf. [Accessed 11 11 2012].

[6] "Web Servers under the Linux Platform," [Online]. Available: http://www.hackchina.com. [Accessed 10 11 2012].

[7] H. Zhu, C. Zhou and G. Chang, "Research and Implementation of Zero-Copy Technology Based on Device Driver in Linux," in In Proceeding of International Multi-Symposiums on Computer and Computational Sciences, 2006.

[8] B. Wang, B. Fang and X. Yun, "The Study and Implementation of Zero-Copy Packet Capture Platform," Chinese journal of computers, vol. 28, no. 1, pp. 46-52, 2005.

[9] D. Stancevic, "Zero copy I: User-mode Perspective," Linux Journal, vol. 2003, no. 105, pp. 1-3, 2003.

[10] "Luca Deri,PF_RING User Guide," [Online]. Available: http://luca.ntop.org/User Guide .pdf. [Accessed 20 11 2012].

[11] L. Wang, "Working Principles of DMA and its Application to Improve hard Disk Speed," Heilongjiang Science and Technology Information, no. 18, p. 64, 2010.

[12] J. Song, X. Xie and S. Ran, "Simplified and Zero-copy TCP/IP Protocol based on Data Captrure System," China measurement technology, vol. 33, no. 1, pp. 114-117, 2007.

[13] H. Xu, J. Liu and Y. Wang, "Simplified Realization of Embedded TCP/IP Protocol Stack Based on ARMCore," Application Research of Computers, no. 10, pp. 251-253, 2006.

[14] G. Cao, "Analysis of the Linux kernel network stack source code," Beijing, Posts & Telecom Press, 2010, pp. 146-181,235-612.

[15] A. K. Amer Mohammed Al-Canaan, "Multimedia Web Services Performance: Analysis and Quantification of Binary Data Compression," Journal of Multimedia, vol. 6, no. 5, pp. 447-457, 2011.

[16] D. B. Lorenzo Sommaruga, "Towards a Semantic Web Based Model for the Tonal System in Standard IEEE 1599," Journal of Multimedia, vol. 4, no. 1, pp. 40-45, 2009.

[17] "Analysis of Web Server Performance Bottlenecks," [Online]. Available: http://www.yesky.com.

[18] J. Hu, J. Li and Z. Zeng, "SWSCF: A Semantic-based Web Service Composition Framework," Journal of Networks, vol. 4, no. 4, pp. 290-297, 2009.

[19] J. Zhu, H. Wu and G. Gao, "An Efficient Method of Web Sequential Pattern Mining Based on Session Filter and Transaction Identification," Journal of Networks, vol. 5, no. 9, pp. 1017-1025, 2010.

[20] Y. Zhou and L. Li, "Network Programming Technique and Implementationin TCP/1P Protoco," Aeronautical Computer Technique, vol. 32, no. 3, pp. 123-124,128, 2002.

172 JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013

© 2013 ACADEMY PUBLISHER

Haipeng Qu, was born in Qingdao of China on Dec. 6, 1972. He got Master degree of Computer Applications Technology in Ocean University of China, Qingdao, China, in Jul. 2002 and PhD degree of Information Security, in State Key Laboratory of Information Security, Institute of Software of Chinese Academy of Sciences, Beijing, China in Mar. 2006. His major field of study is information security.

He has been working in the department of Computer Science and Technology since Mar. 2006, published articles: Fast Ethernet Instant Monitor Customizable Memory Access Model, Proceedings of NetSec 2007, and An IP Trace-back Scheme with Packet Marking in Blocks, Journal of Computer Research and Development, 2005. Current and previous research interests include information security and network security, delay-tolerant networks, underwater acoustic sensor networks, etc.

Lili Wen was born in Liaocheng, Shandong Province. She got her Master degree of Computer Applications Technology in Ocean University of China, Qingdao, China in Jun, 2011. Her major field of study is information security and computer network. Yanfei Xu was born in Nangning, Guangxi Province. He got his Bachelor degree of Computer Science and Technology in Ocean University of China, Qingdao, China in Jun. 2011. His major field of study is information security and computer network. Ning Wang was born in Laiwu, Shandong Province. She got her Bachelor degree of Computer Science and Technology in Ocean University of China, Qingdao, China in Jun. 2011. His major field of study is information security and computer network.

JOURNAL OF NETWORKS, VOL. 8, NO. 1, JANUARY 2013 173

© 2013 ACADEMY PUBLISHER