inter-host communication. berkeley sockets

Introduction to Network Programming in UNIX & LINUX 3-1© D.Zinchin [[email protected]]

Inter-Host Communication.Berkeley Sockets.

Berkeley Sockets

This is API (Application Program Interface) for different Communication Protocol Suites

(TCP/IP, Unix Domain, XNS, etc)

Socket API contains the set of system calls for establishing of network connection and transfer of data:

socket() create endpoint

bind() bind address

listen() specify request queue

accept() wait for connection

connect() connect to peer

read(), write(), transfer data

recv(), send(),

recvfrom(), sendto(),

recvmsg(), sendmsg()

close(), shutdown() terminate connection


5-Tuple Association and Socket AddressGeneric Model

Association = { Protocol, Port A, Address A, Port B , Address B }Socket Address = { Protocol, Port, Address }

TCP/IP ModelTCP Association = {TCP, TCP Port A, IP Address A, TCP Port B, IP Address B }UDP Association = {UDP, UDP Port A, IP Address A, UDP Port B, IP Address B }Socket Address = {TCP/UDP, Port, IP Address }

Note:In IPv4 and IPv6 the Socket Address has different format because of different length of IP Address ( 32 bits for IPv4 and 128 bit for IPv6)

Unix Domain The Unix Domain protocols are not an actual protocol suite, but a way of performing client / serverCommunication on a single host using the same API that is used for clients and servers on different hosts. The Unix Domain protocols are an alternative to the inter-process communication (IPC). There are two protocols:

- UNIXSTR Stream Protocol (analog of TCP)- UNIXDG Datagram Protocol (analog of UDP)

The Unix Domain socket binding is provided to file path.Unix Domain Association = { UNIXSTR / UNIXDG, File Path A, 0, File Path B, 0 }Socket Address = { UNIXSTR / UNIXDG, File Path }

# netstat -aActive Internet connections (servers and established)Proto Recv-Q Send-Q Local Address Foreign Address Statetcp 0 0 *:bootpc *:* LISTENtcp 0 0 *:x11 *:* LISTENtcp 0 0 *:2055 *:* LISTENtcp 0 0 Knoppix:2055 Knoppix:44992 ESTABLISHEDtcp 0 0 Knoppix:44992 Knoppix:2055 ESTABLISHEDudp 0 0 *:2055 *:*Active UNIX domain sockets (servers and established)Proto RefCnt Flags Type State I-Node Pathunix 2 [ ACC ] STREAM LISTENING 10226 /var/run/dbus/system_bus_socketunix 2 [ ACC ] STREAM LISTENING 13345 /ramdisk/tmp/ksocket-knoppix/kdeinit__0unix 2 [ ] DGRAM 2318 @/org/kernel/udev/udevd


Generic Socket Address StructureA socket address structure is always passed by reference when passed as an argument to any socket functions. All these functions are declared to accept the following generic socket address structure:

#include <sys/socket.h>

struct sockaddr { sa_family_t sa_family; /* address family: AF_LOCAL, AF_INET, AF_INET6, … */ char sa_data[14]; /* protocol-specific address */ };

IPv4 Socket Address Structure

#include <netinet/in.h>

struct in_addr { in_addr_t s_addr; /* 32-bit IPv4 address, network byte ordered */};

struct sockaddr_in { sa_family_t sin_family; /* AF_INET */ in_port_t sin_port; /* 16-bit TCP or UDP port number, network byte ordered */ struct in_addr sin_addr; /* 32-bit IPv4 address, network byte ordered */ char sin_zero[8]; /* unused */};

Unix Domain Address Structure

#include <sys/un.h>

struct sockaddr_un { sa_family_t sun_family; /* AF_LOCAL */ char sun_path[108]; /* null-terminated pathname */};

Socket Address Representation

Note:

Due to the standard, the fields of Socket Address structure would be filled with Network byte order.

This is big-endian order, when upper byte has higher address in memory.


Socket Address Encoding Functions

Byte Ordering Functions

#include <netinet/in.h>

uint16_t htons(uint16_t host16bitvalue) ; /* Host TO Network Short converter */

uint32_t htonl(uint32_t host32bitvalue) ; /* Host TO Network Long converter */

uint16_t ntohs(uint16_t net16bitvalue) ; /* Network TO Host Short converter */

uint32_t ntohl(uint32_t net32bitvalue) ; /* Network TO Host Long converter */

Byte Manipulation Functions

#include <strings.h>

void bzero(void *dest, size_t nbytes); /* places n null bytes in the string dest */

void bcopy(const void *src, void *dest, size_t nbytes); /* copies n bytes from src to dest */

int bcmp(const void *ptr1, const void *ptr2, size_t nbytes); /* returns 0 if strings are identical, 1 otherwise */

Address Conversion Functions

These functions convert IP Address from decimal dotted notation to integer and vice versa.

#include <arpa/inet.h>

int inet_aton(const char *strptr, struct in_addr *addrptr); /* Address TO Network. Return:1-success / 0-error */

in_addr_t inet_addr(const char *strptr); /* Deprecated, returns INADDR_NONE on error */

int inet_pton(int af, const char *strptr, void *addrptr); /* Presentation TO Network. Analog of inet_aton(),

supporting different address families */

char *inet_ntoa(struct in_addr inaddr); /* Network TO Address. Return: pointer to string */const char *inet_ntop(int af, const void *addrptr, /* Network TO Presentation. Analog of inet_ntoa(), char *strptr, size_t strlen); supporting different address families */


Create The Socket

#include <sys/types.h> #include <sys/socket.h>

int socket (int family, int type, int protocol);

• Creates the Socket - the endpoint for communication

• Parameter family specifies Address (or Protocol) Family:

AF_INET IPv4

AF_INET6 IPv6

AF_LOCAL (or AF_UNIX) Unix Domain

• Parameter type specifies the type of socket:

SOCK_DGRAM datagram socket

SOCK_STREAM stream socket

SOCK_RAW raw socket (exit to Network Layer)

• Parameter protocol defines the protocol.

If 0, SOCK_DGRAM UDP, SOCK_STREAM TCP.

Other values:

IPPROTO_UDP

IPPROTO_TCP

IPPROTO_ICMP (used only with SOCK_RAW type)

IPPROTO_RAW (used only with SOCK_RAW type)

• Returns Socket Descriptor on success, -1 on error (errno specifies the error)

Note:

Each AF_... constant has corresponded PF_... constant of the same value, which could be used in the same way.


Bind: Assign the Local Address to Socket#include <sys/types.h>


int bind (int sockfd, const struct sockaddr * addr, socklen_t addrlen);

• Assigns the local IP Address and Port, specified by addr parameter, to the socket. Non-specific IP Address is specified by constant INADDR_ANY, non-specific port could be specified by value 0.• Server binds well-known IP Address and Port. If INADDR_ANY and specific Port is specified, the Server will listen on this Port from any of host’s IP Addresses.• Client binds the specific or non-specific IP Address and Port.• Returns 0 on success, -1 on error

Example: Client Socket Binding

struct sockaddr_in cliAddr;int sockfd;

/* create socket */sockfd = socket(AF_INET, SOCK_STREAM, 0);if (sockfd < 0) {perror….;}

/* prepare Client binding */bzero((char*)&cliAddr, sizeof(cliAddr));cliAddr.sin_family = AF_INET;cliAddr.sin_addr.s_addr = htonl(INADDR_ANY); cliAddr.sin_port = htons(0);

/* bind the socket */if(bind(sockfd, (struct sockaddr*)&cliAddr, sizeof(cliAddr)) < 0) {perror….;}

Example: Server Socket Binding #define SERV_HOST_ADDR “145.9.112.75”#define SERV_TCP_PORT 5678struct sockaddr_in servAddr;int sockfd;

/* create socket */sockfd = socket(AF_INET, SOCK_STREAM, 0);if (sockfd < 0) {perror….;}

/* prepare Server binding */bzero((char*)&servAddr, sizeof(servAddr));servAddr.sin_family = AF_INET;If (inet_aton(SERV_HOST_ADDR, &servAddr.sin_addr) == 0) {perror…;}servAddr.sin_port = htons(SERV_TCP_PORT);

/* bind the socket */if(bind(sockfd, (struct sockaddr*)&servAddr, sizeof(servAddr)) < 0) {perror….;}


Connect: Assign the Foreign Address to Socket

#include <sys/types.h>


int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

• Used to specify foreign Address.

• For connection-oriented protocols (TCP, socket type = SOCK_STREAM) this call is used only by Client.

It establishes actual connection with Server, using address, specified by addr parameter.

(TCP sends SYN segment, waits for ACK segment and analyzes the cause of possible connection failure).

• For connectionless protocols (UDP, socket type = SOCK_DGRAM) this call is optional.

It could be used by Server or by Client to store the already known foreign Address and to use it for following

datagram sending / receiving. In this case foreign Address would not be specified for each sending datagram

and would not be extracted from each receiving datagram.

For connectionless protocol the actual connection with Server is not established.

“Connection refused” error could be identified only after next system call, really sending the data.

• Before calling connect() Client does not have to perform bind() call. In this case connect() call will assign also

the local address to the socket (as it done by bind(), called by Client with INADDR_ANY and port 0 ).

• Connection-oriented sockets can successfully connect() only once.

Connectionless sockets can use connect() multiple times to change their association.

Connectionless sockets can dissolve the association, specifying AF_UNSPEC family in call to connect()

(or NULL pointer to address structure on some UNIX systems – see ‘man connect’).

• Returns 0 on success, -1 in case of error.

If connect() fails, connection-oriented the socket is no longer usable and must be closed.


Send Data Through Socket

• These calls are used to transmit a message buf of length buflen to another transport end-point. • Calls send() and write() may be used only when the socket is in a connected state (Note! On some UNIX systems sending of empty datagram on connected UDP socket is impossible with write() call. The call send() would be used instead. )• Calls sendto() and sendmsg() may be used at any time. • The target address is specified by parameter to of length tolen .• If the message is too long to pass atomically through the underlying protocol, then the error EMSGSIZE is returned, and the message is not transmitted.• If the socket does not have enough buffer space available to hold the message being sent, send() blocks, unless the socket has been placed in non-blocking I/O mode (see fcntl). • The flags parameter is formed from the bitwise OR of zero or more of the following: MSG_OOB Send "out-of-band" data. Supported only by SOCK_STREAM sockets of AF_INET (AF_INET6) families. MSG_DONTROUTE The SO_DONTROUTE option is turned on for the duration of the operation. It is used only by diagnostic or routing programs.• The call sendmsg() call uses a msghdr structure to minimize the number of directly supplied parameters. • These calls return the number of bytes sent, or -1 if an error occurred.

#include <sys/types.h>#include <sys/socket.h>ssize_t write ( int sockdescr, const void* buf, size_t buflen);ssize_t send ( int sockdescr, const void* buf, size_t buflen, int flags);ssize_t sendto ( int sockdescr, const void* buf, size_t buflen, int flags, const struct sockaddr *to, int tolen);struct msghdr { void * msg_name; /* optional address */ socklen_t msg_namelen; /* size of address */ struct iovec * msg_iov; /* scatter/gather array – handles the list of memory fragments to be read/written by single system call*/ size_t msg_iovlen; /* number of elements in msg_iov */ void * msg_control; /* ancillary data*/ socklen_t msg_controllen; /* ancillary data buffer len */};ssize_t sendmsg( int sockdescr, const struct msghdr *msg, int flags);


Receive Data from Socket

• These calls are used to receive message from another socket to buffer buf of length buflen. • Calls read(), recv() may be used only on a connected socket • Calls recvfrom() and recvmsg() may be used to receive data on a socket whether it is in a connected state or not. • If parameter from is not a NULL pointer, the source address of the message is filled in. • Parameter fromlen is a value-result parameter, initialized to the size of the buffer from, and modified on return to

indicate the actual size of the address stored there. • If a message is too long to fit in the supplied buffer, excess bytes may be discarded depending on the type of

socket the message is received from.• If no messages are available at the socket, the receive call waits for a message to arrive, unless the socket is non-

blocking, in which case -1 is returned with errno set to EWOULDBLOCK. ( =EAGAIN. See fcntl() )• The flags parameter is formed by ORing one or more of the following:

MSG_OOB

Read any "out-of-band" data present on the socket rather than the regular "in-band" data.

MSG_PEEK

"Peek" at the data present on the socket; the data is returned, but not consumed,

so that a subsequent receive operation will see the same data.• The call recvmsg() call uses a msghdr structure to minimize the number of directly supplied parameters. • These calls return the number of bytes received, or -1 if an error occurred.

#include <sys/types.h>#include <sys/socket.h>#include <sys/uio.h>

ssize_t read (int sockdescr, void *buf, size_t buflen);ssize_t recv ( int sockdescr, void *buf, size_t buflen, int flags);ssize_t recvfrom ( int sockdescr, void *buf, size_t buflen, int flags, struct sockaddr *from, int *fromlen);ssize_t recvmsg ( int sockdescr, struct msghdr *msg, int flags);


Close the Socket

• These calls close the socket connection.

• Call shutdown() call shuts down all or part of a full-duplex connection.

If how is 0, then further receives will be disallowed.

If how is 1, then further sends will be disallowed.

If how is 2, then further sends and receives will be disallowed.

• The system returns from these calls immediately, but in case of TCP protocol the kernel still tries to send

already queued data (if SO_LINGER socket option is not specified).

• These calls return 0 on success , -1 in case of failure.

int close (int sockdescr);int shutdown (int sockdescr, int how);

#include <sys/socket.h> int getsockname(int sockfd, struct sockaddr *localaddr, socklen_t *addrlen);int getpeername(int sockfd, struct sockaddr *peeraddr, socklen_t *addrlen);

Get Local and Foreign Address of the Socket

• These functions used to extract the local and foreign IP Address and Port of the Socket

• Return 0 on success, -1 on error.


UDP Socket Example 1. Iterative Server

UDP Client

sendto( )

socket( )

recvfrom( )

close( )

UDP Server

sendto( )

socket( )

recvfrom( )

bind( )well-knownport

…process request…

…process reply…

data (request)

data (reply)

end session ?

stop server ?end session ?

close( )

yes

no

yes

no

yes

no

exit( ) exit( )

blocks untildatagram receivedfrom client


UDP Socket Example 2. Using connect().

UDP Client

send( )

socket( )

recv( )

close( )

UDP Server

send( )

socket( )

recvfrom( )



…process reply…

data (request)

data (reply)

end session ?

stop server ?

end session ?

close( )

yes

no

yes

no

connect( )

connect( )

connect(..AF_UNSPEC or NULL.. )

yes

no

exit( )

exit( )



UDP Socket Example 3. Concurrent Server.

UDP Clientsocket( )

recv( )

close( )

UDP Server

send( )

socket( )

recvfrom( )



…process reply…

data (request)

data (reply)

end session ?

stop server ?

end session ?

close( )

yes

no

yes

no

connect( ) connect( )

yes

no

fork( )

is parent ?

close( )

socket( )

close( )

exit( )

exit( )

yes

nosubserver

exit( )

recvfrom(..MSG_PEEK.. )

sendto( )

data (1st reply)

close parent socket



UDP Examples Synchronization Problems

Problem 1 End of response notificationClient would know, when the last portion of data arrived from Server.

Possible Solution

Last datagram sent by Server would be empty.

Problem 2 Connection timeoutClient not always could receive “connection refused” error message even if it uses connect() system call.

Possible Solution

The connection timeout would be handled by Client.

Problem 3 Lost and disordered datagramsSome of datagrams could be lost because of:

- Socket buffer overflow, when Server sends more quickly than Client is able to receive and process

- Because of network problems

- Because of unexpected termination of peer process

The datagrams could be received by Client in another order, than they were sent by the Server, because

each datagram could have its own route.

Possible Solution

Implementing of sequence control and flow by means of Acknowledgement with Retransmission algorithms


TFTP – Trivial File Transfer Protocol(Example of UDP-based standard application protocol)

TFTP is standard UDP-based application protocol, providing simple method of transferring files between two hosts.

Developed and standardized in 1981. It is much smaller and easier than FTP (File Transfer Protocol).

Unlike FTP, the TFTP provides only file transfer and does not provide user authentication, directory listing, etc.

Because of simplicity, TFTP could be used for bootstrap of LAN workstations.

TFTP Message Formats

TFTP supports 2 transfer modes: “octet” and “netascii”

Error codes:1- File not found2- Access violation3- Disk full4- Illegal FTP operation5- Unknown port 6- File already exists7- No such userAll 2-bytes fields: opcode, block#, errcode are stored in

Network Byte Order

1 file name 0 transfer mode 0

2 file name 0 transfer mode 0

3 block # data up to 512 bytes

4 block #

5 errcode error message 0

2 bytes

2 bytes

2 bytes

read request(RRQ)

write request(WRQ)

acknowledgement(WRQ)

data

error


TFTP Transfer Scenarios

• TFTP Server is concurrent.

• TFTP Server well-known port is 69/udp

• TFTP Client initiates connection, sending

RRQ (read from Server) or WRQ (write to Server)

first datagram.

• Server main process spawns child Sub-Server

process to handle Client request. Main process

then returns to listening on well-known port.

• Sub-Server creates new socket and binds it with

unique local port. All following responses to the

Client then sent from this port.

• Client reads Sub-Server address from 1st response

and then uses it for all following session.

• The data block shorter than 512 bytes is

recognized as the last data portion.

TFTP Server is standard UNIX daemon.

TFTP Client is realized as standard UNIX utility tftp

File

File

RRQ

ACK, block #1

ACK, block #2

data, block #1

data, block #2

data, block #3

WRQ

data, block #1

data, block #2

ACK, block #0

ACK, block #1

ACK, block #2

. . . . . . . . .

. . . . . . . . .

Client(receiver)

Server(sender)

Client(sender)

Server(receiver)


“Sorcerer’s Apprentice Syndrome”Sorcerer's Apprentice Syndrome (SAS) is a particularly bad network protocol flaw in the original versions of TFTP. Occurred in case of packet delay, which was not taken into account when the protocol was designed.It was named after the “Sorcerer's Apprentice” segment of Walt Disney motion picture Fantasia. Packet delay led to growing number of duplicated packets with following “chain reaction”, congestive collapse of network and transfer failure.Original Bad Design:• Both Server and Receiver use timeout with retransmission• Receiving duplicated acknowledgement, Server retransmits the data

Fixed Design:• Receiver does not have retransmission timer• Sender ignores duplicated acknowledgement

send DATA(n)

(time out)retransmit DATA(n)

receive ACK(n)send DATA(n+1)

receive ACK(n) (duplicate)send DATA(n+1) (duplicate)

receive ACK(n+1)send DATA(n+2)

receive ACK(n+1) (duplicate)send DATA(n+2) (duplicate)

. . .

receive DATA(n)send ACK(n)

receive DATA(n) (duplicate)send ACK(n) (duplicate)

receive DATA(n+1)send ACK(n+1)

receive DATA(n+1) (duplicate)send ACK(n+1) (duplicate)


. . .

send DATA(n)

(time out)retransmit DATA(n)

receive ACK(n)send DATA(n+1)

receive ACK(n) (duplicate)(don’t send anything)

receive ACK(n+1)send DATA(n+2)

. . .

receive DATA(n)send ACK(n)

receive DATA(n) (duplicate)send ACK(n) (duplicate)



. . .


Listen, Accept - TCP Server-Specific System Calls#include <sys/socket.h>

int listen (int sockfd, int backlog);

• Assigns the length of the queue for TCP connection requests.• Parameter backlog specifies the number of pending requests queued by system. • If queue full, Client performs connection request retransmission.• Returns 0 on success, -1 on error


int accept (int sockfd, struct sockaddr *cliaddr, socklen_t *addrlen);

• Returns the next completed connection from

the front of the completed connection queue.• If the completed connection queue is empty,

the process is put to sleep • The cliaddr and addrlen parameters are used

to return the address of connected Client.• Returns new socket descriptor automatically

created by the kernel. Returns -1 in case of error.• Usually a Server creates only one listening socket,

which then exists for the all lifetime of the server.

The kernel creates one connected socket for each accept()-ed client connection.

When a Server (or concurrent Sub-Server) is finished serving a given client,

the connected socket is closed.

completed connection queue(ESTABLISHED state)

incomplete connection queue(SYN_RCVD state)

three-way

handshake

complete

arrivingSYN

TCPconnectionqueues

sum of both queuescannot exceed backlog

server

accept

3-Way handshakeand TCP connection queues

client server

SYN n

SYN k, ACK n+1

ACK k+1

connect called

connect returns

create entry on incomplete queue

entry moved to completed queueaccept returns

RTT(round-trip

time)RTT

(round-trip time)


TCP Socket Examples. Iterative & Concurrent Server.

yes

listenSock = socket(...)

bind (listenSock,…)

listen (listenSock,…)

acceptSock = accept (listenSock,…)

fork( )

close (listenSock)

recv (acceptSock,…)


send (acceptSock,…)

close (acceptSock)

close (acceptSock)

exit( )

concurrent server only

concurrent sub-server only

iterative server only

socket( )

connect( )

send( )

recv( )

close( )

connection establishment

data (request)

data (reply)

…process reply…

TCP ServerTCP Client

is parent ?

exit( )

stop server ?

close(listenSock)yes

no

exit( )

yes

no

end session ? no

blocks untilconnection request from client

blocks untildata receivedfrom client

end session ?

yes

no


UDP and TCP Server Comparison

UDP Server TCP Server

Readiness to get request from Client

•Executes recvfrom() • Executes listen().• Executes accept() to get already

established connection from queue

Content of Message from Client

• All messages:

Application data with Client Address• The same call recvfrom() for all the

messages from Client

• 1st message: Connection request

Next messages: Application data• Separate call recv() to read the

Application data

Concurrent Server Implementation

• Creates the separate socket

on another port

• Accepts from kernel new socket

descriptor, already connected to client

through the same port

Reliability of service • Must be provided at application layer • Provided by kernel on transport layer


Socket Options and Control OperationsThere are various ways to get and set the options that affect a socket:• System call fcntl() - provides control functions on files and sockets• System call ioctl() - provides control functions on files, sockets, terminal and devices• System call setsockopt()


int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen);

int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen);

• These system calls get / set the value of Socket Option for socket, specified by descriptor sockfd

• Parameter level specifies subsystem responsible for the option, for example:

SOL_SOCKET – general socket code

IPPROTO_IP, IPPROTO_TCP – protocol-specific code

• Parameter optname specifies ID of the specific option.

• Parameter optval used to specify/extract the specific value of the option.

• Parameter optlen specifies length of optval.

• The option could be binary flag (0 or 1) or more complex value.

• The following socket options are inherited by a connected TCP socket from the listening socket: SO_DEBUG, SO_DONTROUTE, SO_KEEPALIVE, SO_LINGER, SO_OOBINLINE, SO_RCVBUF, SO_RCVLOWAT, SO_SNDBUF, SO_SNDLOWAT, TCP_MAXSEG, and TCP_NODELAY. To ensure that one of these socket options is set for the connected “accept” socket when the three-way handshake completes, we must set that option for the “listen” socket.

Example:don't wait for TIME_WAIT state delay expiration before TCP Server restart.int on=1;….if(setsockopt (sockListen, SOL_SOCKET, SO_REUSEADDR, (char *)&on, sizeof(on)) < 0){ perror(…);}


Socket Options

intnoget and clear error on the socket SO_ERROR

intnoget the type of the socket SO_TYPE

intno in Linuxtimeout value for inputSO_RCVTIMEO

intno in Linuxtimeout value for outputSO_SNDTIMEO

intno in Linuxminimum byte count for inputSO_RCVLOWAT

intno in Linuxminimum byte count for outputSO_SNDLOWAT

intyesbuffer size for inputSO_RCVBUF

intyesbuffer size for outputSO_SNDBUF

intyesenables reception of out-of-band data in band(else, OOB data is received with MSG_OOB flag only)SO_OOBINLINE

int (bool)yespermits to transmit broadcast messagesSO_BROADCAST

structlingeryes

linger if data present - block shutdown(), close() callsuntil queued data sent or timeout expiredSO_LINGER

int (bool)yesenables routing bypass for outgoing messagessame as flag MSG_DONTROUT)SO_DONTROUTE

int (bool)yesenables sending keep connections alive messagesSO_KEEPALIVE

int (bool)yesenables local address reuseSO_REUSEADDR

int (bool)yesenables recording of debugging information (for processes with EUID=0 only)SO_DEBUGSOL_SOCKET

int (bool)yesenables or disables the Nagle optimizationalgorithm for TCP socketsTCP_NODELAY

intnoget TCP maximum segment sizeTCP_MAXSEGIPPROTO_TCP

int (bool)yesenables sending if datagrams with checksum=0UDP_NOCHECKSUMIPPROTO_UDP

char[]yesoptions in IP headerIP_OPTIONSIPPROTO_IP

Data TypeSettableDescriptionNameLevel


#include <unistd.h>#include <fcntl.h>

int fcntl(int fildes, int cmd, /*arg*/ ...);

F_SETFL, O_NONBLOCK Non-blocking I/O.

F_SETFL, O_ASYNC Signal-driven I/O.SIGIO sent when socket status changed

F_SETOWNF_GETOWN

Set / Get the socket owner. Owner is recipient of SIGIO and SIGURG signals.

#include <unistd.h>

int ioctl(int fd, int request, … /* void *arg */);

• A common use of ioctl() by network programs (typically servers) is to obtain information on all the host's interfaces, the interface addresses, whether the interface supports broadcasting, multicasting, etc.

Socket Control Operations

struct rtentryDelete routeSIOCDELRT struct rtentryAdd routeSIOCADDRT

Routing

struc arpreqDelete ARP entrySIOCDARP struc arpreqGet ARP entrySIOCGARP struc arpreqCreate/modify ARP entrySIOCSARP

ARP

(many more; implementation-dependent)SIOCxxx struct ifreqGet interface MTUSIOCGIFMTU struct ifreqGet interface metricSIOCGIFMETRIC struct ifreqSet interface metricSIOCSIFMETRIC struct ifreqGet subnet maskSIOCGIFNETMASK struct ifreqSet subnet maskSIOCSIFNETMASK struct ifreqGet broadcast addressSIOCGIFBRDADDR struct ifreqSet broadcast addressSIOCSIFBRDADDR struct ifreqGet point-to-point addressSIOCGIFDSTADDR struct ifreqSet point-to-point addressSIOCSIFDSTADDR struct ifreqGet interface flagsSIOCGIFFLAGS struct ifreqSet interface flagsSIOCSIFFLAGS struct ifreqGet interface addressSIOCGIFADDR struct ifreqSet interface addressSIOCSIFADDR struct ifconfGet list of all interfacesSIOCGUFCONFInterface

intGet process ID / group ID of fileFIOGETOWN intSet process ID / group ID of fileFIOSETOWN intGet # of bytes in receive bufferFIONREAD intSet/clear asynchronous I/O flagFIOASYNC intSet/clear nonblocking flagFIONBIOFile

intGet process ID / group ID of socketSIOCGPGRP intSet process ID / group ID of socketSIOCSPGRP intAt out-of-band mark ?SIOCATMARKSocket

DatatypeDescriptionrequestCategory


I/O Multiplexing Solution 1. Fork child process per channel

Input/Output Multiplexing is simultaneous handling of 2 or more different I/O channels.

Examples

1) Printer connected to network waits simultaneously for requests:

- from local host processes - on UNIX Domain STREAM socket - from remote processes - on IPv4 TCP socket

2) Superserver inetd – waits for requests for multiple different services on

different ports and invokes corresponding service as separate sub-server

process, when specific request is accepted.

Process

sd1

so

ck

1

so

ck

2

sd2

kernel

I/O Multiplexing

Process

so

ck

1

so

ck

2

fd

kernel

Child Process 1

fd sd1

Child Process 2

fd sd2

pipe

read

write write

read read

fork

fork


I/O Multiplexing Solution 2. Polling

• Set both sockets to non-blocking mode

fcntl (sd1, F_SETFL, fcntl(sd1, F_GET_FL,0) | O_NONBLOCK);

fcntl (sd2, F_SETFL, fcntl(sd2, F_GET_FL,0) | O_NONBLOCK);

• Read from both sockets and, if nothing available, wait timeout

while(…)

{

/* try to read from 1st socket */

len = read (sd1, buff);

if (len >=0) break; /*go to data processing */

if (errno != EWOULDBLOCK) { perror(…);}

/* try to read from 2nd socket */

len = read (sd2, buff);

if (len >=0) break; /*go to data processing */

if (errno != EWOULDBLOCK) { perror(…);}

/* provide polling timeout sleep */

sleep(TIMEOUT);

}

/* begin accepted data processing */

…

Note: The following two error codes are equal:

EWOULDBLOCK = EAGAIN


I/O Multiplexing Solution 3. Signal-Driven I/O

• Establish handler for SIGIO signal.

void iohandler (int sig)

{…}

…

signal (SIGIO, iohandler); /* or using sigaction() */

• Declare process as sockets owner

fcntl (sd1, F_SETOWN, getpid( ) );

fcntl (sd2, F_SETOWN, getpid( ) );

• Enable signal-driven I/O

fcntl (sd1, F_SETFL, fcntl(sd1, F_GET_FL,0) | O_ASYNC);

fcntl (sd2, F_SETFL, fcntl(sd2, F_GET_FL,0) | O_ASYNC);

The Problems

1) Signal does not contain information, on which descriptor the event occurred.

2) This model works good with UDP – signal is sent by kernel when:

- new datagram arrives

- asynchronous error occurs

In case of TCP the signal is sent on any socket status change:

- connect request accepted, connection established, disconnect request accepted, disconnected,

data arrived, data sent, error occurred, etc.

As result, it is difficult to recognize the proper event to handle the data.


I/O Multiplexing: System Call select#include <sys/select.h> /* According to POSIX */

#include <sys/time.h> /* According to earlier standards */#include <sys/types.h>#include <unistd.h>

struct timeval { long tv_sec; /* seconds */ long tv_usec; /* microseconds */};

int select (int maxFDplus1, fd_set *readFDs, fd_set *writeFDs, fd_set *exceptFDs, struct timeval *timeout);

FD Set Type fd_set is bitmask. Each bit flag in the mask corresponds to file descriptor which equal to the index of this bit flag.

FD Set Modification Macros:FD_ZERO (fd_set *set); /*empty the set*/FD_SET (int fd, fd_set *set); /* set one bit ON */FD_CLR (int fd, fd_set *set); /* set one bit OFF */FD_ISSET (int fd, fd_set *set); /* test one bit */

Example. Wait for input [fd(0)] during 5 seconds#include <sys/time.h>#include <sys/types.h>#include <unistd.h>… fd_set rfds;struct timeval tv;int retval; FD_ZERO(&rfds);FD_SET(0, &rfds);tv.tv_sec = 5;tv.tv_usec = 0;

retval = select(1, &rfds, NULL, NULL, &tv);

if (retval == -1) { perror("select()");} else if (retval) { printf("Data is available now.\n"); /* FD_ISSET(0, &rfds) will be true. */}else{ printf("No data within five seconds.\n");}

• This system call provides the process with possibility to wait for a number of file descriptors to change status: - Descriptors in set readFDs are watched to became ready for non-blocking read - Descriptors in set writeFDs are watched to became ready for non-blocking write - Descriptors in set exceptFDs are watched for exceptions• Specific events are not watched, if FD Set parameter is NULL.• Parameter maxFDplus1 specifies the maximal width of FD Sets to avoid check of all the bit flags in each FD Set.• Parameter timeout specifies the wait period as follows: - NULL – wait until I/O event occurs. - pointer to structure with positive values – wait no more than specified time. - pointer to structure with tv_sec=tv_usec=0 – don’t wait (non-blocking check, used by polling)• Return value is number of descriptors where event occurred. The 3 FD Sets are modified to point on affected descriptors.• Return value is 0 if timeout expired, -1 if error occurred.


#include <sys/select.h> /* According to POSIX */

#include <sys/time.h> /* According to earlier standards */#include <sys/types.h>#include <unistd.h>struct timespec { long tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */};int pselect (int maxFDplus1, fd_set *readFDs, fd_set *writeFDs, fd_set *exceptFDs, const struct timespec *timeout, const sigset_t *sigmask);

• This system call provides the process with possibility to wait simultaneously: - for I/O events (for a number of file descriptors to change status) - for signals• It differs from select() call only by format of timeout parameter and by additional parameter sigmask, which specifies the signal disposition for accepting of desirable signals.

select()

sigsuspend() = pselect()

Example 1. Race condition.if the signal occurs in critical region, it will be lost if select blocks forever.

void sighandler(int sig){ intr_flag=1;}…

if (intr_flag) handle_intr(); /* handling before select */ if ( (nready = select( ... ) ) < 0) { if (errno == EINTR) { if (intr_flag) handle_intr(); /* handling after select */ } ... }

criticalregion

Example 2. Reliable solution.…

sigset_t newmask, oldmask, zeromask;sigemptyset(&zeromask);sigemptyset(&newmask);sigaddset(&newmask, SIGINT);

sigprocmask (SIG_BLOCK, &newmask, &oldmask); /* block SIGINT */if (intr_flag) handle_intr(); /* handle the signal */if ( (nready = pselect ( ... , &zeromask) ) < 0) { if (errno == EINTR) { if (intr_flag) handle_intr (); } ...}

I/O Multiplexing with simultaneous signal handling.

2 IN 1

I/O Multiplexing: System Call pselect


#include <sys/poll.h>

struct pollfd { int fd; /* file descriptor */ short events; /* requested events */ short revents; /* returned events */};int poll(struct pollfd *fdArray, unsigned int fdArrayLen, int timeout);

Event Mask Bits: #define POLLIN 0x0001 /* There is data to read */ #define POLLPRI 0x0002 /* There is urgent data to read */ #define POLLOUT 0x0004 /* Writing now will not block */ #define POLLERR 0x0008 /* Error condition */ #define POLLHUP 0x0010 /* Hung up */ #define POLLNVAL 0x0020 /* Invalid request: fd not open */#ifdef _XOPEN_SOURCE #define POLLRDNORM 0x0040 /* Normal data may be read */ #define POLLRDBAND 0x0080 /* Priority data may be read */ #define POLLWRNORM 0x0100 /* Writing now will not block */ #define POLLWRBAND 0x0200 /* Priority data may be written */#endif#ifdef _GNU_SOURCE /* Linux only */ #define POLLMSG 0x0400#endif

• The system call poll() is variation of system call select()

• The parameter fdArray is array of structures of length fdArrayLen.

• Each structure in fdArray corresponds to single file descriptor to be watched for events.

• Requested and returned events are specified by separate event masks (events, revents), constructed

from Event Mask Bits.

• The timeout value specified as number of milliseconds and has following meaning:

- timeout = -1 – wait until I/O event occurs.

- timeout > 0 – no more than specified time.

- timeout = 0 – don’t wait (non-blocking check, used by polling)

• Return value is number of descriptors where event occurred.

The occurred event type is described by revents field of fdArray element, corresponded to specific descriptor.

• Return value is 0 if timeout expired, -1 if error occurred.

I/O Multiplexing: System Call poll


I/O Models: Blocking, Non-Blocking

1. Blocking I/O Model

2. Non- Blocking I/O Model

I/O Modelsapplication kernel

system call

system call

system call

system call

system call

recvfrom

recvfrom

recvfrom

recvfrom

recvfrom

application kernel

no datagram ready

no datagram ready

no datagram ready

no datagram ready

datagram ready

copy datagram

processdatagram

processdatagram

wait for data

wait for data

EWOULDBLOCK

EWOULDBLOCK

EWOULDBLOCK

datagram ready

copy datagram

copy complete

copy complete

copy data fromkernel to user


return OK

process blocks incall to recvfrom

return OK

process repeatedlycalls recvfrom,waiting for an

OK return (polling)


I/O Models: Multiplexing, Signal-Driven

4. Signal-Driven I/O Model

3. I/O Multiplexing Model

I/O Modelsapplication kernel

system callselect no datagram ready

processdatagram

wait for data

datagram ready

copy datagram

copy complete


return OK

process blocks incall to select,

waiting for one ofsockets to become

readable

system callrecvfrom

return readable

process blocks whiledata copied into

application buffer

application kernel

system callsigaction(set SIGIO handler)

processdatagram

wait for data

datagram ready

copy datagram

copy complete


return OK

process continuesexecuting

system callrecvfrom

deliver SIGIO

process blocks whiledata copied into

application buffer

return

SIGIO handler

no datagram ready


I/O Models: Asynchronous

I/O Models

5.Asynchronous I/O Model

The POSIX defines set of system calls,

providing the API for implementationof asynchronous I/O on set of descriptors:

aio_read(), aio_write()

• These functions allow the calling process to initiate single read (write) asynchronous I/O request.

aio_error()

• This function returns the error status associated with the single asynchronous I/O request. It is equivalent to errno value that would be set by the corresponding read() or write() system call.• If the operation has not yet completed, then the error status will be equal to EINPROGRESS.

aio_return()

• This function returns the result associated with the single asynchronous I/O request after its completion.

lio_listio() • This function allows the calling process to initiate a list of I/O requests within a single function call. • Depending on passed argument values, the function could wait until all I/O is complete or to return immediately after request scheduling, with following signal notification.

application kernel

system callaio_read(set signal handler)

processdatagram

wait for data

datagram ready

copy datagram

copy complete


deliver signal

process continuesexecuting

return

signal handler

no datagram ready


I/O Models ComparisonEach input operation normally has two distinct phases:• Waiting for the data to be ready• Copying the data from the kernel to the process

POSIX gives the following definitions of Synchronous and Asynchronous Input / Output:• A Synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes.• An Asynchronous I/O operation does not cause the requesting process to be blocked.Using these definitions, the first four I/O models—Blocking, Non-Blocking, I/O Multiplexing, and Signal-Driven I/O—are all Synchronous because the actual I/O operation blocks the process. And only the Asynchronous I/O model matches the definition of Asynchronous I/O.

wait for data

copy data from kernelto user

Blocking I/O Non-Blocking I/O I/O Multiplexing Signal-Driven I/O Asynchronous I/O

initiate

complete complete complete complete notification

initiatecheck

check

check

check

check

check

check

check

initiate

ready

initiate

notification

blo

cked

blo

cked

blo

ckedb

locked

blo

cked

1st phase handled differently2nd phase handled the same(blocked in call to recvfrom)

handles bothphases


Daemon ProcessDaemon is a process that runs in the background and is not associated with a controlling terminal.

Many standard UNIX network services: printer, remote login, file transfer, tasks scheduling (cron)

are provided by servers which running as daemons.

Who starts the daemons ?

• Daemons started during system startup

These daemons are started by scripts /etc/rc…and have superuser privileges.

In such a way traditionally the following daemons are started:

Server cron (initializes the specified tasks in scheduled time)Superserver inetd (listens on multiple sockets and spawns sub-servers per service request)

Web Server

Mail ServerServer syslogd (provides logging services for all running daemons)

• Daemons started by superserver inetd

These are FTP Server, TFTP Server, Telnet Server, Rlogin Server, etc.

These servers are spawned by inetd for handling of specific request and run as daemons.• Daemons started by cron server

These are the programs configured (in system file /usr/lib/cronab ) to be started in specific scheduled time.

A program also could be scheduled by means of crontab and at UNIX commands.

All programs started in specific moment of time by cron server, are executed as daemons.• Programs started from user terminal

How could a process to become a daemon ?• Pass to background• Disassociate from process group• Disassociate from control terminal• Close stdin, stdout, stderr and all inherited unnecessary file descriptors• Reset working directory and file creation mask.


Daemon syslogdSince daemons do not have a controlling terminal, they can not use standard output and standard error streams.

To provide information output from daemons, the standard syslogd daemon is used.

The syslogd daemon life cycle:

• starts during UNIX startup;

• reads its configuration (file /etc/syslog.conf ), specifying where the accepted messages would be logged;

• opens Unix Domain socket and binds it to well-known name (/var/run/log or /dev/log);

• listens for messages from all other daemons and handles them according to specified configuration.

To communicate with syslogd daemon, other daemon processes could use the following functions:

#include <syslog.h>

void syslog(int priority, const char *message, ... );

void openlog(const char *ident, int options, int facility);

void closelog( );

• Function syslog() establishes the connection with daemon syslogd and logs the message.

• Parameter message is format string (as in printf() ) extended with %m pattern,

which is replaced with the error message corresponding to the current value of errno.

• Parameter priority is combination of 2 values:

- level (severity) (0=LOG_EMERG – highest severity, …, 7=LOG_DEBUG – lowest severity)

- facility (functional area) (LOG_CRON, LOG_FTP, LOG_MAIL, LOG_USER, etc.)

The level and facility values are used in configuration file /etc/syslog.conf for specification,

to where the syslogd will forward specific messages. (See full level and facility list in man description)

• Function openlog( ) could be called in the beginning of a process to specify common prefix ident, default value

of facility and additional options (output to console, print PID, etc.) for all the upcoming messages.

• Function closelog( ) could be called when the application is finished sending log messages.


Daemon Initialization

#include <syslog.h>

int closeAll {…}/* To find all actually open descriptors, navigate through /proc/self/fd/ or use system-dependent system calls like fcloseall(), closefrom() etc. */

int daemon_init(const char *prog_name, int facility){ int i; pid_t pid;

if ( (pid = fork()) < 0){ return (-1); }else if (pid > 0){ exit(0); /* parent terminates */ }

/*--- child 1 continues in background... ----*/ if (setsid() < 0) { /* become session leader */ return (-1); }

signal(SIGHUP, SIG_IGN); /* disassociate from control terminal */ if ( (pid = fork()) < 0) { return (-1); }else if (pid > 0) { exit(0); /* child 1 terminates */ }

/*--- child 2 continues, it is not session leader ... ----*/

chdir("/"); /* change working directory */

closeAll(); /* close all file descriptors */ open("/dev/null",O_RDONLY); /* redirect stdin, stdout, and stderr */ open("/dev/null",O_RDWR); open("/dev/null",O_RDWR); openlog(pname, LOG_PID, /* pre-configure syslog output */ facility);

return (0); /* initialization success */}

This example shows the function,

which “demonizes” the process.

Some of UNIX / LINUX systems

provide daemon () function with

the same functionality.

• The First fork passes the 1st child

process to background • To disassociate from control

terminal, process became

session (and group) leader• The fork guarantees,

that the 2nd child is no longer

a session leader, so it cannot

acquire a control terminal. • The standard input, output and

error are redirected to /dev/null

to avoid the errors, when these

descriptors unexpectedly assigned

to files, sockets, and then printf or

perror is called.• Working directory and file mask

could be reset to specific values,

depending on process functionality.


Superserver inetdThis server simultaneously waits for requests for multiple different services on different ports and invokes

corresponding service as separate sub-server process, when specific request is accepted.

The Superserver inetd has the following advantages:

1. It allows a single process inetd to be waiting for incoming client requests for multiple services,

instead of one process for each service. This reduces the total number of processes in the system.

2. It simplifies writing daemon processes since most of the startup details are handled by inetd.

The “price” for these advantages is execution of fork() and exec() for every handling request.

The Superserver inetd life cycle:

• Starts during UNIX system startup

• Reads its configuration from file /etc/inetd.conf

• Opens all sockets specified by configuration and performs simultaneous wait for request

• Accepting request, spawns ( fork() + exec() ) specified sub-server to handle specific request and passes actual server name as first argument (argv[0]) to spawned process.

The structure of

configuration file

/etc/inetd.conf :

FIELD DESCRIPTION

service-name must be in /etc/servicessocket-type stream or dgramprotocol tcp or udp (must be in /etc/protocols)wait-flag wait (iterative) or nowait (concurrent)user-name from /etc/passwd, typically rootserver-program full pathname to be used for execserver-program-arguments arguments for exec

ftp stream tcp nowait root /usr/bin/ftpd ftpd -1telnet stream tcp nowait root /usr/bin/telnetd telnetdlogin stream tcp nowait root /usr/bin/rlogind rlogind -stftp dgram udp wait nobody /usr/bin/tftpd tftpd -s /tftpboot…

The example of

configuration file

/etc/inetd.conf :


Superserver inetdwork schema

1) On startup, inetd reads the /etc/inetd.conf file and creates a socket of the appropriate type (stream or datagram) for all the specified services. It binds the sockets and for TCP sockets also performs listen.

2) Simultaneous waiting for request on all open sockets is performed with system call select.

3) For each arrived request sub-server process is forked. It duplicates stdin, stdout and stderr to be a socket descriptor, sets GID and UID and performs exec call to actual server.

4) Parent process in the same time continues to wait for other requests. Also it periodically accepts and handles SIGCHILD when sub-servers terminate.

5) For concurrent services (nowait) corresponded bits in fd_set bitmask are always reset to ON before next select call.

For iterative services (wait) the corresponded bit flags restored only during handling of SIGCHILD signal after termination of previous sub-server of the same type.

Note: Configuring of UDP services as nowait can cause a race condition, where:-the inetd program selects on the socket -and the server program reads from the socket.

socket( )

is parent ?

yes

no

bind( )

listen( )(TCP only)

..add to FD Set..

For each servicelisted in file/etc/inetd.conf

select( ) (Read events)

accept( )

(TCP only)

fork( )

close accept-ed socket(TCP only)

close all FDs other than socket

dup socket FD to FDs: 0,1,2

close socket FD

setgid() setuid()(if user not root)

exec( ) server

yes

no

temporary remove FDfrom FD Set

while child is running

is “wait” service ?


applicationcode

resolver code

functioncall

functionreturn

application

resolverconfiguration

files

localnameserver

othernameserver

UDPrequest

UDPreply

Name and Address ConversionsThere are four types of network information that an application might want to look up:

• Hosts

• Networks

• Protocols

• Services

Protocol and Service information is always obtained from static files (/etc/protocols, /etc/services).

Host and Network information could be obtained from

different sources:

• Domain Name System (DNS)

• Static files (/etc/hosts, /etc/networks)

• Network Information System (NIS)

• Lightweight Directory Access Protocol (LDAP).

The specific type of name service used by specific

host is depends on configuration, provided by administrator.

The user application, independently on specific name service

configuration, obtains this information using Resolver –

standard functionality, providing interface to name service.

Resolver Functionality

The standard API provides the following conversion methods:

• Hosts information gethostbyaddr(), gethostbyname()

• Networks information getnetbyaddr(), getnetbyname()

• Protocols information getprotobyname(), getprotobynumber()

• Services information getservbyname(), getservbyport()


#include <netdb.h>struct hostent { char *h_name; /* official (canonical) name of host */ char **h_aliases; /* pointer to array of pointers to alias names */ int h_addrtype; /* host address type: AF_INET */ int h_length; /* length of address: 4 */ char **h_addr_list; /* ptr to array of ptrs with IPv4 addrs */};struct hostent *gethostbyname (const char *hostname);struct hostent *gethostbyaddr (const char *addr, socklen_t len, int family);

Host Information Utilities

• Both these methods provide host information by Domain Name or IP Address• Return value is pointer to hostent structure on success, NULL on failure (h_errno specifies the error)

Example. Extract IP Address by Host Name

…

char [ ] hostName=“www.google.com”;

sockaddr_in servAddr;

struct hostent* pEntry;

…

pEntry=gethostbyname(hostName);

if (pEntry != null)

{

servAddr.sin_family = pEntry -> h_addrtype;

bcopy(pEntry -> h_addr_list,

(char*) &servAddr.sin_addr,

pEntry ->h_length);

}…

h_name

h_aliases

h_addrtype

h_length

h_addr_list

official hostname \0

NULL

NULL

alias #1 \0

alias #2 \0

IP addr #1

in_addr{ }

IP addr #2

in_addr{ }

IP addr #3

in_addr{ }

h_length=4

hostent { }

AF_INET

4


Example of gethostbyname() usage.#include <stdio.h>#include <netdb.h>#include <sys/types.h>#include <netinet/in.h>#include <sys/socket.h>#include <arpa/inet.h>

intmain(int argc, char **argv){ char *ptr, **pptr; char str [INET_ADDRSTRLEN]; struct hostent *hptr;

while (--argc > 0) { ptr = *++argv; if ( (hptr = gethostbyname (ptr) ) == NULL) { fprintf (stderr, "gethostbyname error for host: %s, h_errno= %d\n", ptr, h_errno ); continue; } printf ("official hostname: %s\n", hptr->h_name);

for (pptr = hptr->h_aliases; *pptr != NULL; pptr++) printf ("\talias: %s\n", *pptr);

switch (hptr->h_addrtype) { case AF_INET: pptr = hptr->h_addr_list; for ( ; *pptr != NULL; pptr++) printf ("\taddress: %s\n", inet_ntoa (*pptr)); break;

default: perror ("unknown address type"); break; } } exit(0);}


Service Information Utilities

#include <netdb.h>struct servent { char *s_name; /* official service name */ char **s_aliases; /* alias list */ int s_port; /* port number, network-byte order */ char *s_proto; /* protocol to use */};struct servent *getservbyname (const char *servname, const char *protoname);struct servent *getservbyport (int port, const char *protoname);

• Both these methods provide service information by service name or port number

• Some Internet services are provided using either TCP or UDP.

In this case parameter protoname could specify specific protocol of interest (“tcp”, “udp”)

• Return value is pointer to servent structure, NULL on failure.

Example. Extract Port Number by Service Name

…

struct servent* pEntry;

int tftpPort;

pEntry=getservbyname(“tftp”, “udp”);

if (pEntry != null)

{

tftpPort = pEntry -> s_port;

}…


Unix Domain SocketsThe Unix Domain protocols are not an actual protocol suite,

but a way of performing Inter-Process Communication

on a single host using socket API.

The Unix Domain socket binding is provided to file path.

To connect() to Unix Domain socket, the process would

have the same permissions as required to open() the file.

Unix Domain Address Structure

#include <sys/un.h>struct sockaddr_un { sa_family_t sun_family; /* AF_LOCAL (AF_UNIX) */ char sun_path[108]; /* pathname\0*/};

Example. Create and Bind

Unix Domain Stream Socket.

#include <socket.h>

#include <sys/un.h>

…

int sockFd;

struct sockaddr_un servAddr;

char [ ] filePath=“/tmp/anyname”;

/* create Unix Domain stream socket */

sockFd = socket (AF_LOCAL, SOCK_STREAM, 0);

/* file to be used as address,

would not exist before binding */

unlink(filePath);

/* bind the socket */

bzero((char*)&servAddr, sizeof(servAddr));

servAddr.sun_family = AF_LOCAL;

strncpy(servAddr.sun_path,

filePath, strlen(filePath));

bind(sockFd, &servAddr, sizeof(servAddr);

…


Unix Domain Socket FeaturesSocket-Based Pipe

#include <sys/socket.h>int socketpair( int family, /* AF_LOCAL */

int type, /*SOCK_STREAM / SOCK_DGRAM */ int protocol, /* 0 */ int sockfd[2] ); /* (output) array of 2 descriptors */

• Creates the pair of already connected unnamed sockets. • If SOCK_STREAM protocol is used, the full-duplex stream pipe is created.• On success returns 0 and fills sockfd[0] and sockfd[1] .• On error returns -1, errno specifies the error.

Unix Domain & Ancillary Data

• Ancillary Data is control information passed by means

of sendmsg(), recvmsg() system calls.• The fields msg_control and msg_controllen of the

structure msghdr are used for Ancillary Data.• The following types of Ancillary Data are used with

Unix Domain sockets:

Passing Descriptors:• Sender opens resource (file) and “sends” the descriptor, allocated in its process.• Receiver “receives” newly allocated descriptor, pointing to the same resource.

Passing Credentials:• Sender “sends” standard structure• Receiver “receives” it filled with sender credentials: PID, UID, EUID, GID, etc.

• To see the specific data structures for handling of Ancillary Data on specific Linux/Unix system,

see man pages for system calls recvmsg and sendmsg.

struct msghdr { void * msg_name; /* address */ socklen_t msg_namelen; /* size of address */ struct iovec * msg_iov; /* scatter/gather array */ size_t msg_iovlen; /* msg_iov array length */ void * msg_control; /* ancillary data*/ socklen_t msg_controllen; /* ancillary data length */};


Host 2

Distributed Application Server

Host 1

Distributed Application Client

Host

Non-Distributed Application

Distributed ApplicationDistributed Application is an application made up of distinct components

that are physically located on different computer systems, connected by a network.

The components of Distributed Application are distributed across multiple computers on a network,

but seem to be running on the same user's computer.

Component A

Component B

Component A

Stub ofComponent B

Stub ofComponent A

Component B

DistributionNetworkRequest

Distributed Application Service is call to remote component with passing of input and output parameters

Distributed Application Server is responsible to accept Client requests and to provide the call Service

Distributed Application Client sends requests to the Server and accepts remote call results.

Distributed Application Design Tasks:

• Discover the desired Server Host

• Discover and connect to the desired Server Process

• Serialization / Deserialization of input/output parameters passed over network.

Examples of Distributed Application Technologies:

• Sun RPC (Remote Procedure Call)

• CORBA (Common Object Request Broker Architecture)

• Java RMI (Remote Method Invocation)

• Microsoft DCOM (Distributed Common Object Model)


Sun RPC

Sun RPC (Remote Procedure Call) is a powerful technique for constructing distributed, client-server based applications. It allows the execution of individual routines on remote computers across a network.

RPC isolates the application from the physical and logical elements of the data communications, and hides the callto subroutine on remote server under the “traditional” local function call interface.

Procedure A

Client Interface

Client Communication

RPC Client

Procedure B

Server Interface

Server Communication

RPC Server

Client Stub of Procedure B

Server Stub of Procedure B

Procedure A

Procedure B

RPC Conversion

Non-Distributed Application

To develop an RPC application the following steps are needed:

• Specify the protocol for client server communication

• Develop the client program

• Develop the server program


P_clnt.c

RPC Application Development

P_main.c

P.x

P_proc.c

rpcgen

P.h

P_svc.c

RPCrun-timelibrary

client stub

server stub

client

server

client program

server program

RPC specification XDR file

client main( ) calls

client stub procedures

server procedurescalled by server stub

link

link

• The remote procedure in RPC is identified by triplet:

Program ID – unique hexadecimal id of Server Program,

Version – version id of Server Program

Procedure ID – serial number of procedure under the specific Server Program

• The client-server communication interface is defined using XDR (eXternal Data Representation) protocol

The XDR defines the set of serializable data types and syntax for Program ID, Version, Procedure ID definition.

• Standard RPC compiler rpcgen compiles XDR interface definitions and builds C code for interface parts of

Server Stub and Client Stub, and also H file with common interface constants.• RPC run-time library is linked during build of Server and Client application and provides Client and Server

communication functionalities.

main.c

proc.c

link

prog

non-distributedprogramcalls

program main ( ) calls

local procedures

local procedurescalled from program main()

calls

calls

Before Distribution After Distribution

commoninclude file


int printmessage(char* msg);void main(){ char [ ] msg = “test”; int result; result = printmessage (msg);}

main.c – before distribution

#include <rpc/rpc.h>#include “P.h”void main(){ char [ ] msg = “test”; int * pResult; CLIENT* pClnt ; /* Connection Handle */ pClnt = clnt_create(“MyHost”, MESSAGEPROG, MESAGEVERS, “udp”); pResult = printmessage_1( &msg, pClnt); clnt_destroy(pClnt);}

P_main.c – after distribution

int printmessage(char* msg){ /* print msg */ return 0;}

proc.c – before distribution#include <rpc/rpc.h>#include “P.h”

int * printmessage_1_svc(char ** args, struct svc_req *);{ static int result; /*must be static to return by pointer */ char * msg = *args; /*extract the argument passed by pointer */ /* print msg */ return &result; /* return result by pointer */}

P_proc.c – after distribution

program MESSAGEPROG { version MESSAGEVERS { int PRINTMESSAGE(string) = 1; } = 1;} = 0x20000099;

P.x - the XDR definition of interface

#define MESSAGEPROG 0x20000099#define MESSAGEVERS 1#define PRINTMESSAGE 1int * printmessage_1(char**, CLIENT*);int * printmessage_1_svc(char **, struct svc_req *);

P.h – generated by rpcgen

rpcgen

RPC Simple Example


Client Host

Connection to RPC Server

Program ID, Version

RPC Server Port Number

Procedure ID, parameters

Result Data

• Port Mapper is standard daemon, listening on port 111 UDP (TCP)

and handling map (Program ID, Version) -> (Port Number)

• Each RPC Server starts on ephemeral port and registers with Port Mapper

• Each RPC Client calls Port Mapper on specific host to accept the port of target RPC Server

Than RPC Client calls the RPC Server with request, containing Procedure ID and parameters

Note:RPC does not provide automatic discovery of Server Host. To accept the RPC service, the RPC Client is responsible to know the name of the target Server Host.,where the proper RPC Server is running.

RPC Client

Server Host

Port Mapper(Port 111)

Registration Map

Program ID, Version

RPC Server Port

RPC Server(Ephemeral Port)

Procedure

Register on startup

Server Stub

Client Stub


NFS: Network File SystemNFS provides transparent file access for clients to files and filesystems on a remote server.

NFS accesses only the portions of a file that a process references, and a goal of NFS is to make this access transparent (user process accesses local and remote filesystems and files information in the same way).

When user process accesses remote filesystem or file, the local NFS Client sends a request to remote NFS Server, which performs the requested operation and provides the requested information in its reply.

Before the local NFS Client can access files from remote NFS Server’s filesystem, this remote filesystem must be mounted to the local mount point at the NFS Client’s host via NFS Mount Protocol.

To reference particular filesystem or file on the remote NFS Server, the NFS Client obtains a File Handle, an opaque object generated by NFS Server. To perform any following operation on remote file or filesystem, the Client sends back to NFS Server the corresponded File Handle.

NFS Client calls are performed by the client kernel, on behalf of client user processes. NFS Servers, for efficiency, are implemented within the server kernel.

NFS implementation is based on RPC.

NFS was originally written to use RPC over UDP. Newer implementations, however, also support RPC over TCP.

/

/var /mydir123

/

/dir123

file1file0

NFS mount

file1


NFSA. Mount remote file system

#mkdir /mydir123#mount hostA:/dir123 /mydir123

(Remote file system /dir123 from host hostA is mounted to mount point directory /mydir123 on local host)

B. Access remote file via NFS

#cat /mydir123/file1

(Transparent access from local host to file1 at mounted file system)

mount command

portmapper

mountd daemon

client kernel server kernel

user process

1.register

at start

2. get port # RPC request

3. RPC reply with port #

4. mount RPC request

5. RPC reply with file handle of remote filesystem

6. mountsystem

call

A

B

client kernel server kernel

2049/udp,tcp

localfile access

NFSClient

NFSServer

localfile access

userprocess

localdisk

localdisk

local file processing

NFS file processing

NFSClient

111/udp,tcp

RPC requests

RPC replies

NFSServer


Network ManagementNetwork Management is the set of activities, methods, procedures, and tools that related to the

operation, administration, maintenance, and provisioning of network systems.

Operation means keeping the network up and running, including the monitoring of possible problems.

Administration means keeping track of network resources and their assignments.

Maintenance means performing repairs and upgrades of software and hardware components of network system.

Provisioning means configuring of network system resources to support a given service.

Network Element

Element Management

Network Management

Service Management

Business Management


FCAPS—ISO Telecommunications Management Network Model

Example: FCAPS of Telecommunication System

FCAPS is abbreviation of: Fault, Configuration, Accounting, Performance, Security- the areas of Network Management.


Network Management in a “nutshell”

Managed

ObjectAgentManager

Request

Response

Unsolicited Notification

Network Managemen

tProtocols

address space

address space

partial data

full data

MIB MIB

MIB (Management Information Base)

The database of information maintained by the Agent, that Manager can query or set

Network Management Protocol

The protocol between Manager and Agent, describing:

-the common rules of addressing

-the basic data types

-the format of requests, responses and notifications


SNMP – Simple Network Management Protocol

The SNMP is Application Layer (OSI Model, layer 7) UDP-based network protocol.

The SNMP message is called Portable Data Unit (PDU):

0. GetRequest - Retrieve the value of a variable or list of variables.

1. SetRequest - Change the value of a variable or list of variables.

2. GetNextRequest - Retrieve the value the lexicographically next variable in the MIB. (Walk through

MIB)

3. GetBulkRequest - Optimized version of GetNextRequest (SNMPv2)

4. Response - Returns variable bindings and acknowledgement for all requests.

5. Trap - Asynchronous notification from agent to manager.

6. InformRequest - Acknowledged asynchronous notification from manager to manager (SNMPv2).

SNMPv1 provides 5 PDU types. SNMPv2, SNMPv3 have 2 more PDU types:

IP Packet

UDP Datagram

SNMP PDU

SNMP Common Header SNMP Get/Set Header SNMP Data


root

ccit(0) iso(1) joint-iso-ccitt(2)

org(3)

dod(6)

internet(1)

country(16)

us(840)

organization(1)

motorola(113728)

gss(1)

cig(1)

common(3)

CigASN1Module { joint-iso-ccitt(2) country(16) us(840) organization(1) motorola(113728) gss(1) cig(1) common(3) asn1Module(2) 0}

-- aliascigcom OBJECT IDENTIFIER ::= { joint-iso-ccitt(2) country(16) us(840) organization(1) motorola(113728) gss(1)

cig(1) common(3) }

-- object OIDscigModule OBJECT IDENTIFIER ::= {cigcom modules(0)}

cigAttribute OBJECT IDENTIFIER ::= {cigcom attributes(1)}

cigGroupAttribute OBJECT IDENTIFIER ::= {cigcom groupAttributes(2)}

-- data typesCigDisplayRadius ::= REAL

CigSiteConfiguration ::= ENUMERATED { omni (0), sixty (1), onetwenty (2), omnisixty (3) }

CigSiteConfigList ::= SEQUENCE OF CigSiteConfiguration

END

SNMP MIB and OIDsSNMP itself does not define which information (which variables) a managed system should offer. The available information is defined by Management Information Bases (MIBs). MIBs describe the structure of themanagement data of a device subsystem.

SNMP MIBs use a hierarchical namespace containing Object Identifiers (OID). Each OID identifies a variable thatcan be read or set via SNMP. The OID Namespace is hierarchical tree.The International Telecommunication Union (ITU) Standardization organization maintains the top-level OIDs and

delegates responsibility to define OID sub-trees to other organizations.

SNMP MIBs are described by means of language ASN.1 (Abstract Syntax Notation 1), containing definitions of OID aliases, Object Identifiers and data types.

OID Namespace tree Fragment of MIB definition in ASN.1 language

2.16.840.1.113728.1.1.3

inter-host communication. berkeley sockets

Documents