time measurement of network data transfer
Post on 01-Jan-2016
35 Views
Preview:
DESCRIPTION
TRANSCRIPT
Outline
• Motivations
• Hardware setup
• Software tools
• Measurement and their (possible) interpretation
• Prospects
Motivations
• Network transfers to L1 and L2 need low latency– For both TEL62-PC and PC-PC transfer, do we know how
much it is?– For which network protocol is it the best?– How does it depend from the computer HW?– How does it depend from the network interface?– How much is it the latency fluctuaction? GPUs are
sensitive…– The knowledge of fluctuations is important to stay within
the 1ms budget • Standard software monitor tools give averages• Try to use hardware signals, generated in strategic points
inside the software• Correlate signals from a sender to those from a receiver
Hardware setup
• Two PCs with GB I/F– A is a Pentium 4 2.4GHz
• Called PCATE
– B is a 2*4 core Xeon• Called PCGPU
– Direct Ethernet connection on hidden network
– Each PC is equipped with a Parallel port I/F• It is used to generate
timing pulses
• Lecroy scope– Time measurements– Histograms– Storage of screenshots
PCATE
PCGPU
Adapter for the parallel port
Software tools
• Investigate three “protocols”– Raw Ethernet packets (socket PF_PACKET, SOCK_RAW)– IP packets (socket PF_INET, SOCK_RAW)– TCP packets (socket PF_INET, SOCK_STREAM)
• Three pairs of simple senders/receivers– The sender
• Gets from the command line packet size, number of packets, delay between packets , downscaling factor (see later)
• Initialize the socket and go in a tight loop, with a delay inside• Inside the loop, before and after the send command, write a
pulse on the parallel port
– The receiver• After inizialization, go in a receive loop and write a pulse on
the parallel port after having received a packet
Code example/* Create raw socket */sock = socket(AF_INET, SOCK_RAW, PROBE_PROT);if (sock < 0) {
perror("opening raw socket");exit(1);
}………………………….
if (iloop<0) iloop = 1000000000; for (i=0;i<iloop;i++) {
if (i%50==0) { buf[0]=0x01; out=0x01; outb(out,0x378); out=0x00; outb(out,0x378); } else buf[0]=0x00;
if (sendto(sock, buf, buflen,0,&server,sizeof(struct sockaddr_in)) < 0)
perror("writing on stream socket"); out=0x02; outb(out,0x378); out=0x00; outb(out,0x378); for (k=0; k<conv_time; k++);
}
• Sender
/* Create socket */sock = socket(AF_INET, SOCK_RAW, PROBE_PROT);if (sock < 0) {
perror("opening stream socket");exit(1);
}…………………. int kk=0; serv_size = sizeof(server);
do { if ((rval = recvfrom(sock, buf, BUFFER_SIZE,0,(struct sockaddr *)&server,&serv_size)) < 0)
perror("reading stream message");i = 0;if (rval == 0)printf("Ending connection\n");else {
if(rval== BUFFER_SIZE) { outb(0x01,0x378); outb(0x00,0x378); } ("-->%d\n", rval);
} while (rval != 0);
• Receiver
Send a pulse
Delay loop
Software tools
• Maximum rate– On the sender, some time is spent for the code execution– The minimum achievable repetition rate between packets
varies from ~6 ms to ~10 ms• Depending on machine speed, type of protocol, etc
• Downscaling factor– Needed to operate properly the scope at high rates
• If the loop index modulo the downscaling factor is 0, send in the packet the pattern to be written by the receiver on the parallel port, otherwise 0
• Packets are sent at the specified rate, but the scope registers only a fraction
• Additional tools used• Wireshark and Tcpdump to check packet arrival• Ifconfig and /proc/interrupts to count packet and interrupt loss
Basic method check
• Are these pulse reliable?– A simple check: histogram the width of the pulse
generated by the sender– Pulse width: ~1.22 ms , sdev 0.04 ms, watch out the
maximum
Parameters used in the tests
• Packet size– Small packets (200 bytes) or large packets (1300 bytes)
• Protocols– 3 as mentioned before
• Delay between packets– Usually from 10 ms down to the minimum– Typical sequence: 10, 5, 2, 1 ms, 100, 50, 20, 10 ms
• Measurements– Store interesting screenshots– Record time difference, sigma, max value
• Time difference = time of rx pulse – time of tx pulse
Lost packets and interrupts
• No lost packets observed at any rate– Checked with ifconfig at source and destination
• Interrupt behaviour via /proc/interrupts– At high rates the number of interrupts decreases
• Well known phenomenon of “interrupt coalescence” in the driver
• Packets received too fast are buffered and the CPU interrupted only once
• For TCP at high rates and 200 bytes buffers, interrupts are reduced also because TCP puts many buffers in an Eth packet
• Anyway, measuring TCP performances is more difficult as the protocol has the freedom of segmenting user buffers as it likes (i.e. flow control)
Time across sendto
Time difference btw a pulse after sendto and one before – The machine is the same
Time across sendto - Fluctuations
Count how many times the time is over 20 ms (wrt all times)
Raw ~5/26000IP ~13/26000TCP min ~8/20000 (1 ms) max ~402/20000 (100 ms) - 1300 bytes
18/26000 - 200 bytes
Quiet example
Moving the mouse…Only
15 > 4500
On PCATE as sender
Transfer time trending
PCGPU->PCATE, raw200 bytes 50
ms1000 bytes 50 ms
1300 bytes 40 ms
200 bytes 20 ms
1000 bytes 20 ms
1300 bytes 20 ms
Summary
• Hardware timing system– Reliable, not interfering with the measurement (at level of
max 10 ms)
• Time spent in the sender– A fraction (<10%) of the total transfer time– Varies with the protocol type– Stable with the packet rate
• Transfer time– Down to 50 ms varies a little as a function of packet rate
• Between 50 and 120 ms
– Below 20 ms it increases (up to 2 ms) for raw, but not for IP
• This setup is not working below ~10 ms– Where we are most interested
To be done
• Complete the measurement– Both directions– All protocols (TCP, maybe new ones)
• Performance as a function of CPU power– Use different PCs– Add load on the machines
• Test multiple I/F and switches• Change the sender to an object driven by an FPGA
– TEL62 or TALK
• Investigate different protocol features– New protocols or switch features of the old ones
• Test more complex transfer sw (i.e. TDBIO)• Some work hopefully done by USA summer students…
top related