exploiting graphics processors for high- performance ip lookup in software routers author: jin zhao,...
TRANSCRIPT
![Page 1: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/1.jpg)
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers
Author:Jin Zhao, Xinya Zhang, Xin Wang , Yangdong Deng , Xiaoming Fu
Publisher:IEEE INFOCOM 2011
Presenter:Ye-Zhi Chen
Date:2012/01/11
![Page 2: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/2.jpg)
INTRODUCTION This paper present GALE , a GPU-Accelerated Lookup Engine, to
enable high-performance IP lookup in software routers.
Authors leverage the Compute Unified Device Architecture
(CUDA) programming model to enable parallel IP lookup on the
many-core GPU.
2
![Page 3: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/3.jpg)
INTRODUCTIONThe key innovations include:
1) IP addresses are directly translated into memory addresses using a
large direct table on GPU memory that stores all possible routing
prefixes
2) Authors also map the route-update operations to CUDA in order
to exploit GPU’s vast processing power
3
![Page 4: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/4.jpg)
architecture The traditional trie-based routing table in system memory
GALE also stores the next-hop information for all the possible IP
prefixes in a fixed-size large memory block on GPU, which is
referred to as direct table
The reason for maintaining two routing tables is that there is an
inherent tradeoff between efficiency and overhead when
considering a direct table versus trie for particular functionality.
4
![Page 5: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/5.jpg)
architecture
5
![Page 6: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/6.jpg)
architectureGALE employs a control thread and a pool of working threads to
implement the lookup and update requests.
The processing of requests is divided into the following steps :
1) Control thread reads a group of requests of the same type from the
request queue classified by Lookup , Insertion , Modification ,
Deletion operations.
2) Control thread activates one idle working thread in the thread
pool to process the group of requests.
3) Working threads invoke corresponding GPU and CPU code to
perform lookup and update operations
6
![Page 7: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/7.jpg)
architecture With GALE, the parallelism in request processing is exploited in
two aspects.
1. As the CPU we use also has multi-cores, the working threads can
be potentially executed simultaneously. Meanwhile, each working
thread will instruct the GPU to launch one kernel. As GTX 470
supports multiple kernels on different SMs, the different request
groups thus can be scheduled in parallel
2. Inside one kernel, the group of requests are of the same type.
Parallelism is thus achieved by mapping these requests to the
CUDA threads, which in turn are scheduled in parallel to different
SPs on GPU
7
![Page 8: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/8.jpg)
LookupAuthors explore an IP lookup solution with O(1) memory access and
computational complexity.
each IP address has exactly one-to-one mapping relation with
entries in the direct table.
They found that over 40% of IP prefixes have the length of 24 ,
and over 99% of the IP prefixes are less than or equal to 24.
They leverage this fact and propose a solution that storing all the
possible prefixes with length less than 24 in one direct table(which
only cost 224=16M memory) and use a separate long table to store
the prefixes that are longer than 24 bits
8
![Page 9: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/9.jpg)
Lookup Only one address translation is enough to find the corresponding
routing entry. Meanwhile, only one memory access is required to
fetch the next-hop destination.
9
![Page 10: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/10.jpg)
UPdate route-update will involve operations to both the direct table and the
trie structure.
As direct table has a fixed size, we actually need not allocate or
release entry space upon route-update operations.
To alleviate the trie-traversal operations in GPU, we introduce a
length table to denote the prefix length.
10
![Page 11: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/11.jpg)
UPdate1) Insertion and Modification: Insertion and modification are
essentially the same operation for direct table
11
![Page 12: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/12.jpg)
UPdate2) Deletion : The deletion operation is similar to modification except
that during updating the range of entries, the next-hop information
is modified to the parent node’s nexthop information in the trie
Deleting an entry is as follows:
replacing the nexthop and prefix length of the updated entry with the
parent’s nexthop and prefix length. The parent node is obtained from
the trie-traversal during deleting the entry node in trie by CPU.
12
![Page 13: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/13.jpg)
UPdate
13
![Page 14: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/14.jpg)
The routing dataset used in the experiments are from FUNET and RIS
Desktop PC equipment : ($1122)
Core i5 750 CPU 2.66Ghz (with 4 cores)
NVIDIA GeForce GTX470 with 1280MB of global memory and
448 stream processors ($428)
OS: 64-bit x86 Fedora 13 Linux distribution with unmodified
kernel 2.6.33
CUDA version : 3.1
14
Experiment
![Page 15: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/15.jpg)
Experimentthe experiments use the following default configuration:
With 8 working threads.
Every CUDA thread (running on GPU ) takes just one lookup
request, or updates one entry while performing update tasks.
Every CUDA grid contains 44 thread blocks, and every block
contains 256 1 threads.
15
![Page 16: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/16.jpg)
Experiment
16
![Page 17: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/17.jpg)
Experiment
17
![Page 18: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/18.jpg)
Experiment
18
![Page 19: Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu](https://reader036.vdocuments.us/reader036/viewer/2022062309/56649cc35503460f9498b5e9/html5/thumbnails/19.jpg)
Experiment
19