1 2nd negst workshop p2p overlay network for tcp programming with udp hole punching takayuki...
Post on 21-Dec-2015
216 views
TRANSCRIPT
12nd NEGST workshop
P2P Overlay Network for TCP P2P Overlay Network for TCP Programming with UDP Hole Programming with UDP Hole
PunchingPunching
Takayuki Okamoto, Taisuke Boku,Mitsuhisa Sato, Osamu Tatebe
Graduate School of Systems and Information Engineering,
University of Tsukuba
2nd NEGST workshop 2
AbstractAbstract
Large amount of idle PCs in the worldLarge amount of idle PCs in the worldBehind NAT and firewallSpecial programming is required to communicate
with each other Relay server, NAT traversal
We are developing a P2P communication We are developing a P2P communication library to ease to use PCs behind NAT and library to ease to use PCs behind NAT and firewallfirewallUDP hole punchingOriginal reliable communication library on UDP/IPUser level management
We use the term of “NAT” for both NAT boxes and firewalls hereafterWe use the term of “NAT” for both NAT boxes and firewalls hereafter
2nd NEGST workshop 3
OutlineOutline
Motivation and objectiveMotivation and objectiveP2P computing
Proposal of a scalable communication Proposal of a scalable communication framework based on NAT traversalframework based on NAT traversalDesign and implementation of communication
library Evaluation of communication performanceEvaluation of communication performance
Performance for UDP with our reliable communication library
Works in FranceWorks in France
2nd NEGST workshop 4
Motivation & backgroundMotivation & background
NAT problemNAT problem Most of computing nodes are behind firewalls or NAT (Network
Address Translation) boxes These nodes can’t communicate with each other directly With relay transfer, the bandwidth of relay-nodes becomes a
bottleneck NAT traversal techniquesNAT traversal techniques
With several negotiation procedures, the nodes can communicate directly through intermediate NATs
Complicated negotiation is required on each application program
Internet
FirewallCluster
PCBroadband router (NAT)
2nd NEGST workshop 5
ObjectiveObjective
Goal: providing a communication framework for efficient aGoal: providing a communication framework for efficient and easily programmable HPC-P2P computing nd easily programmable HPC-P2P computing
Easy to use nodes behind NATs High scalability High throughput High portability for a large variety of environments
Internet
FirewallCluster
PCBroadband
router (NAT) New communication
environment
2nd NEGST workshop 6
Requirement specificationRequirement specification
Direct communication based on NAT traversalDirect communication based on NAT traversal Name space independent from the physical oneName space independent from the physical one Fully distributed management systemFully distributed management system User-level implementationUser-level implementation
Internet
FirewallCluster
PCBroadband
router (NAT) New communication
environment
2nd NEGST workshop 7
Overlay networksOverlay networks
Virtual networks constructed on application Virtual networks constructed on application layerlayer
Generally defined as Generally defined as “a routing (relay) “a routing (relay) system among involved nodes”system among involved nodes”
Independent from the physical network Relay nodes may become bottlenecks Applications neglect the network topology
Our systemOur system Name space and communication methods between
any pair of nodes without packet-relay Applications can be designed for effective
communication on physical network Supporting both applications and frameworks
2nd NEGST workshop 8
Design concept of our systemDesign concept of our system
Two different types of communicationTwo different types of communication Managements and controls in our system Data transfer on applications
Name & Name & negotiationnegotiation
(management (management system)system)
Data transferData transfer
(communication (communication library)library)
RequiremenRequirementt
Consistency, Consistency,
quick responsequick responseHigh throughputHigh throughput
StyleStyle Server-client, DHTServer-client, DHT P2PP2P
TopologyTopology TreeTree DirectDirect
FeatureFeature Using Using super-nodesuper-node with very few trafficwith very few traffic
Without relay-nodeWithout relay-node
High scalabilityHigh scalability
2nd NEGST workshop 9
Design of communication libraryDesign of communication library
Socket API compatible with TCP/IPSocket API compatible with TCP/IPEasy porting of existing applications written in
TCP/IPEasy programming with large flexibility - not limited
to “master-slave” style Communication method is automatically Communication method is automatically
selectedselectedPure (direct) TCP/IP is the bestUPnP is supported by wide class of home-use
NATsUDP hole punching is mostly available on NATs
for TCP-programming, reliable streaming⇒ communication feature must be provided by software
2nd NEGST workshop 10
Reliable communication on Reliable communication on UDP/IPUDP/IP
RI2N/UDPRI2N/UDP Developed by JST-CREST “Mega-Scale Computing”
Project Basically designed for fault-tolerant communication
on PC cluster with Ethernet Based on UDP/IP, but provides TCP-like streaming
communication, retransmission and simple congestion control algorithm
Porting to our communication layer for P2P Porting to our communication layer for P2P computingcomputing⇒ ⇒ SoUSoU (Stream on UDP) library (Stream on UDP) library
2nd NEGST workshop 11
Preliminary performance Preliminary performance evaluationevaluation
Performance evaluation on SoU Performance evaluation on SoU librarylibrary
Throughput Latency
EnvironmentEnvironment Two client nodes in two houses
under different ISPs over the Internet
The server node in University of Tsukuba
Home-use “broadband router” to be used
BBR-4HG : max 92Mbps BLR3-TX4 : max 90Mbps
Four connection methodsFour connection methods(1) TCP DMZ(2) SoU DMZ(3) TCP relay(4) SoU + UDP hole punching
Node-A Node-B
Router-1(NAT)
Router-2(NAT)
server
Internet
Router-1 : BBR-4HGRouter-2 : BLR3-TX4Node-A : Pentium M 1.2GHzNode-B : Pentium M 2.26GHzserver : Xeon 3.0GHz
University
SINET(MEXT)
ISP2(BB.Excite)ISP1(So-net)
2nd NEGST workshop 12
anywhere→ NAT-1:1000→ Node-A:1000
Node-A Node-BNAT-1 NAT-2
(port 1000) (port 2000)
global network
Connection methods (1) and (2)Connection methods (1) and (2)
Method (1): TCP/IP with DMZ function of NATMethod (1): TCP/IP with DMZ function of NAT Method (2): SoU with “UDP” DMZ function of NATMethod (2): SoU with “UDP” DMZ function of NAT
DMZ function: port forwarding function to transfer all inbound packets on NAT to a node behind NAT
setting manually
TCP DMZ
SoU DMZ
TCP/IP or UDP/IP
2nd NEGST workshop 13
Connection method (3)Connection method (3)
TCP/IP packet relay through ServerTCP/IP packet relay through Server Each node makes a TCP/IP channel with the server The server relays packets from one side to the other side
through TCP/IP channel Two times of transmission is required to send a packet
Node-A Node-BNAT-1 NAT-2
(port 1000) (port 2000)
global network
server
TCP relay
TCP/IP
2nd NEGST workshop 14
Connection method (4)Connection method (4)
SoU over UDP hole punchingSoU over UDP hole punching All nodes share the information of IP addresses and ports by the
server through the management channel with TCP/IP Two client nodes establish a direct communication channel with
UDP/IP by UDP hole punching Over this UDP channel, SoU is used for streaming and reliable
communication between Node-A and Node-B
Node-A Node-BNAT-1 NAT-2
(port 1000) (port 2000)
global network
server
SoU + UDP hole punching
UDP hole punching
Information = address + port
SoU connection
Data transfer
2nd NEGST workshop 15
ThroughputThroughput
1.67
1.43
2.41
1.44
0
1
2
3
TCP DMZ SoU DMZ TCP relay SoU + UDPhole
punching
thro
ughp
ut [M
B/s
]
TCP DMZ vs. SoU + UDP hole TCP DMZ vs. SoU + UDP hole punchingpunching
Simple vs. complex Different only 15% Realizing P2P direct
communication without NAT problem
TCP DMZ vs. TCP relayTCP DMZ vs. TCP relay Direct vs. indirect TCP relay is 45% higher Communication path between
ISPs Throughput depends on
bandwidth between ISPs University has a strong
connection with both ISPs TCP relay makes a bottleneck on
scalable system SoU + UDP hole punching is the SoU + UDP hole punching is the
best way for P2P computingbest way for P2P computing
Single-sided burst transfer
2nd NEGST workshop 16
LatencyLatency
15.414.6
24.6
14.3
0
10
20
30
TCP DMZ SoU DMZ TCP relay SoU + UDPhole
punching
late
ncy
[ms]
Three methodsThree methods Very small difference
Physical latency is large Difference among protocols
is relatively small Same hop-count ≈ same
latency
TCP relayTCP relay The largest
Double time hop-count
Latency depends on the Latency depends on the number of hops in WANnumber of hops in WAN
Throughput depends on absolute bandwidthAverage time for 1 byte message transfer
2nd NEGST workshop 17
Works in France (1)Works in France (1)
Porting UDP hole punching in Private Virtual ClustPorting UDP hole punching in Private Virtual Cluster (tun version)er (tun version)PVC provides IP level virtualization
Reliability is not requiredThroughput on LAN achieves 90 Mbps on 100BA
SE-TX with tuning of MTU
Applicationdaemon
TCP
IP
tun device
Real NIC
UDP hole punching
UDP
2nd NEGST workshop 18
Works in France (2)Works in France (2)
Making arrangements for performance Making arrangements for performance evaluation between France and Japanevaluation between France and JapanNodes in Grid5000 can be used only with their
self2 nodes in France and 4 nodes in Japan are
available
PCs
PCsin Univ. Tsukubain home
PCin my home
In FranceIn J apan
2nd NEGST workshop 19
Future worksFuture works
Performance improvement of SoU libraryPerformance improvement of SoU library Implementing more sophisticated algorithms of
flow control Performance evaluation between France Performance evaluation between France
and Japanand JapanComparing SoU with TCPUpgrading SoU for throughput with large latency
2nd NEGST workshop 20
2nd NEGST workshop 21
2nd NEGST workshop 22
2nd NEGST workshop 23
The Procedure of UDP hole The Procedure of UDP hole punchingpunching
Node-A Node-BNAT-1 NAT-2
(port 1000) (port 2000)
global network
NAT-2:2000→ NAT-1:1000→ Node-A:1000
×?
to NAT-2:2000
NAT-1:1000→ NAT-2:2000→ Node-B:2000
Sharing the Information of IP address and port
This method is available with “Cone NATs”
Server
Created by outbound packets
to NAT-1:1000×?
2nd NEGST workshop 24
Motivation & backgroundMotivation & background
P2P (Peer-to-peer) computing and its potential powerP2P (Peer-to-peer) computing and its potential power Utilize a great potential computation power provided by a
number of PCs Public Resource Computing : Aggregating the computation
power of idling PCs in home and office in P2P manner Volunteer computing (BOINC, etc) Supporting only master-worker style applications
Clusters
PCsin university
in home
PCsin office
2nd NEGST workshop 25
ConclusionConclusion
We proposed a communication framework We proposed a communication framework for P2P computing for HPC applications for P2P computing for HPC applications with high scalabilitywith high scalabilityEasily programmable even through NATsScalable for a number of nodes without relay-
server bottleneck Performance evaluation on WAN Performance evaluation on WAN
environmentenvironmentSoU library provides an acceptable performanceRelatively large cost to establish a connection, but
negligible for long-term HPC applications Our system has acceptable performance Our system has acceptable performance
and scalability for HPC-P2Pand scalability for HPC-P2P
2nd NEGST workshop 26
Related workRelated work
Generic studies : JXTA, NAT BLASTER, STUNT, OCALA and Generic studies : JXTA, NAT BLASTER, STUNT, OCALA and Skype A2A API …Skype A2A API …
NAT traversal techniquesNAT traversal techniques Wide-Area Communication for Grids: An Integrated
Solution to Connectivity, Performance and Security Problems [Alexandre et at al. HPDC’04]
Simultaneous TCP : Another TCP connection establishment procedure on RFC793
User-level implementation Usable under more particular condition than UDP hole punching
Overlay network without relaysOverlay network without relays Private Virtual Cluster: Infrastructure and Protocol for
Instants Grids. [Ala et at al. Europar’06] High application portability with TUN/TAP Installation needs root authority
2nd NEGST workshop 27
NAT traversal techniquesNAT traversal techniques Techniques to allow a direct communication among Techniques to allow a direct communication among
nodes behind NATsnodes behind NATs UDP hole punching
The most widely used method and easy to implement on user-level
Communication is limited to UDP/IP UPnP (Universal Plug and Play)
To configure hardware devices temporally through the network UDP/IP and TCP/IP are available Each NAT box must support the feature explicitly
They are used mainly in multimedia applications VoIP (Skype, Google Talk, etc.) Constant throughput is required for long period Several amount of packet-loss is allowed without the
retransmission for UDP/IP For wider variety of applications, we need more concrete and
easy to control communication methods
2nd NEGST workshop 28
Cost to establish a connectionCost to establish a connection
Most preliminary resultMost preliminary result TCP DMZ, SoU DMZ and TCP relayTCP DMZ, SoU DMZ and TCP relay
Same as round-trip time SoU + UDP hole punchingSoU + UDP hole punching
Negotiation, UDP hole punching and SoU are required Similar to 7 times of round-trip time For HPC, this is a little overhead
TCP DMZTCP DMZ SoU DMZSoU DMZ TCP relayTCP relay SoU + UDP hole SoU + UDP hole punchingpunching
28.9 ms28.9 ms 28.5 ms28.5 ms 23.3 ms23.3 ms 199.4 ms199.4 ms
The shortest time to establish a connection
2nd NEGST workshop 29
28.9 28.5 23.3
199.4
0
100
200
TCP DMZ RUDP DMZ TCP relay RUDP + UDPhole punching
time
to e
stab
lish
a co
nnec
tion
[ms]
Cost to establish a connectionCost to establish a connection
The shortest time to establish a connection
RDUP+UDP hole punching requires 7 times transmissions on WAN:
•1 time on DNS resolution•4 times on sharing of address information•1 time on UDP hole punching•1 time on SoU connection establishment
Acceptable for HPC applications as a little overhead
2nd NEGST workshop 30
Design of management systemDesign of management system
(DHT)
direct Connection(TCP, UPnP, UDP
hole punching )
Router(NAT, firewall)
Client nodes
Server nodes Distributed “super-Distributed “super-nodes” to manage the nodes” to manage the systemsystem
Name space management based on DHT (Distributed Hash Table)
Helps the negotiation among NATs for UDP hole punching
Relays packet only when it is necessary
2nd NEGST workshop 31
Structure of Management SystemStructure of Management System
(DHT)
direct Connection(TCP, UPnP, UDP
hole punching )
common node
Super node
Router(NAT, firewall)
Client
Server
Router(NAT, firewall)
A server and many clients
Many super-nodeand many common nodes
2nd NEGST workshop 32
System design overviewSystem design overview
our system
Communication Library
A Framework of Public Resource Computing etc.
TCP/ IP
NAT Traversal
UDP hole punching
UDP/ IP
UPnP
Management System
Name Management
Node Management TCP
Monitoring the overlapping of
the names
Holding TCP connections with all
client nodesProviding direct
communication for data through NATs
DHT (Distributed Hash Table) is used for consistent and scalable management
2nd NEGST workshop 33
System design overviewSystem design overview
our system
Communication Library
Various frameworks for Public Resource Computing etc.
TCP/ IP
NAT Traversal
UDP hole punching
UDP/ IP
UPnP
Management System
Name Management
Node Management TCP
Name resolution from virtual name to real IP address
Node pair rendezvous for NAT traversal Providing direct
communication for data through NATs
2nd NEGST workshop 34
LatencyLatency
Node-A Node-B
Router-1(NAT) Router-2
(NAT)
server
Internet
15ms
10ms11ms
2nd NEGST workshop 35
Cost to establish a connectionCost to establish a connection
Most preliminary result Most preliminary result TCP DMZ, SoU DMZ, TCP relayTCP DMZ, SoU DMZ, TCP relay
Request and replay on TCP or SoU= round-trip time
SoU + UDP hole punchingSoU + UDP hole punching Negotiation, UDP hole punching and SoU’s establishment
= round-trip time x 7
TCP DMZTCP DMZ SoU DMZSoU DMZ TCP TCP relayrelay
SoU + UDP hole SoU + UDP hole punchingpunching
28.928.9 28.528.5 23.323.3 199.4199.4
2nd NEGST workshop 36
The Procedure of UDP hole The Procedure of UDP hole punchingpunching
Node-A Node-BNAT-1 NAT-2
(port 1000) (port 2000)
global network
NAT-2:2000→ NAT-1:1000→ Node-A:1000
×to NAT-2:2000
NAT-1:1000→ NAT-2:2000→ Node-B:2000
Information transfer through a server
This method is available with “Cone NATs”
Server
to NAT-1:1000
to NAT-2: 2000
Reachableto Node-B
Automatically created
Reachable using a mapping information
2nd NEGST workshop 37
Reliable communication on Reliable communication on UDP/IPUDP/IP
RI2N/UDPRI2N/UDP Developed by JST-CREST “Mega-Scale Computing” Project Basically designed for fault-tolerant communication on PC cluster
with Ethernet Based on UDP/IP, but provides TCP-like streaming communication,
retransmission and simple congestion control algorithm Porting to our communication layer for P2P computingPorting to our communication layer for P2P computing
⇒ ⇒ RUDPRUDP (Reliable UDP) library (Reliable UDP) library
type window size (padding)
source port destination port
sequence number
option
data
ack bitmap
link status (fail or not)
for selective acknowledgements
to share the failure information
All RI2N channels share only one UDP port