Ch11

Network Programming

Interestingly, all network applications are based on the same basic programming model, have similar overall logical structures, and rely on the same programming interface.

11.1 The Client-Server Programming Model

Every network application is based on the client-server model.

The fundamental operation in the client-server model is the transaction

client-server transaction

A client-server transaction consists of four steps:

  1. When a client needs service, it initiates a transaction by sending a request to the server. For example, when a Web browser needs a file, it sends a request to a Web server.
  2. The server receives the request, interprets it, and manipulates its resources in the appropriate way. For example, when a Web server receives a request from a browser, it reads a disk file.
  3. The server sends a response to the client and then waits for the next request. For example, a Web server sends the file back to a client.
  4. The client receives the response and manipulates it. For example, after a Web browser receives a page from the server, it displays it on the screen.

It is important to realize that clients and servers are processes and not machines, or hosts as they are often called in this context !!!

A single host can run many different clients and servers concurrently, and a client and server transaction can be on the same or different hosts. The client-server model is the same, regardless of the mapping of clients and servers to hosts.

11.2 Networks

To a host, a network is just another I/O device that serves as a source and sink for data

Network on Host

11.3 The Global IP Internet

  • check your host ip(Linux only)
1
2
$ hostname -i
$ hostname -I
  • Store IP address
1
2
3
struct in_addr{
uint32_t s_addr ;
}
  • Convert to network endian
1
2
3
4
5
6
7
8
9
#include <arpa/inet.h>
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
//Returns: value in network byte order
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(unit16_t netshort);
//Returns: value in host byte order

//Note that there are no equivalent functions for manipulating 64-bit values !!!
  • Application programs can convert back and forth between IP addresses and dotted-decimal strings using the functions inet_pton and inet_ntop.
1
2
3
4
5
6
7
8
9
#include <arpa/inet.h>
int inet_pton(int af, const char *src, void *dst);
//Returns: 1 if OK, 0 if src is invalid dotted decimal, −1 on error
const char *inet_ntop(int af, const void *src, char *dst, socklen_t size);
//Returns: pointer to a dotted-decimal string if OK, NULL on error

//the “n” stands for network and the “p” stands for presentation

//af is used to specify the address length(32 for ipv4 (AF_INET), 128 for ipv6(AF_INET6))

Practice Problem 11.1

Complete the following table:

1
2
3
4
5
6
7
8
9
{
"Dotted-decimal address":"Hex address",
"107.212.122.205" :
"64.12.149.13" :
"107.212.96.29" :
"" : 0x00000080,
"" : 0xFFFFFF00,
"" : 0x0A010140
}

My solution: :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <stdio.h>
#include <arpa/inet.h>
#include <stdint.h>

#define N 3
int main(){
char * addrs [N] = {
"107.212.122.205",
"64.12.149.13",
"107.212.96.29"
};

uint32_t hexs [N] ={
0x00000080,
0xFFFFFF00,
0x0A010140
};

char buf [16];

for (int i = 0 ; i<N ; i++){
uint32_t temp ;
inet_pton(AF_INET, addrs[i], &temp);
temp = htonl(temp);
printf("%s\t: 0x%08x\n" , addrs[i] , temp);
}
for(int i = 0 ; i< N ; i++){
uint32_t temp = hexs[i];
temp = htonl(temp);
inet_ntop(AF_INET, &temp, buf, 16);
printf("%s\t: 0x%08x\n" , buf , hexs[i]);
}
}
1
2
3
4
5
6
7
$ ./net
107.212.122.205 : 0x6bd47acd
64.12.149.13 : 0x400c950d
107.212.96.29 : 0x6bd4601d
0.0.0.128 : 0x00000080
255.255.255.0 : 0xffffff00
10.1.1.64 : 0x0a010140

Be fxxking careful when processing network data, any data on host is host-endian !!!!

11.3.2 Internet Domain Names

The set of domain names forms a hierarchy, and each domain name encodes its position in the hierarchy

DN hierarchy

  • The nodes of the tree represent domain names that are formed by the path back to the root. Subtrees are referred to as subdomains.

11.3.3 Internet Connections

Internet clients and servers communicate by sending and receiving streams of bytes over connections

  • A connection is point-to-point in the sense that it connects a pair of processes.

  • It is full duplex in the sense that data can flow in both directions at the same time

  • And it is reliable in the sense that—barring some catastrophic failure such as a cable cut by the proverbial careless backhoe operator—the stream of bytes sent by the source process is eventually received by the destination process in the same order it was sent

  • A socket is an end point of a connection.

    • Each socket has a corresponding socket address that consists of an Internet address and a 16-bit integer port and is denoted by the notation address:port.
    • The port in the client’s socket address is assigned automatically by the kernel when the client makes a connection request and is known as an ephemeral port.
    • The mapping between well-known names and well-known ports is contained in a file called /etc/services.(http->80 , https->443)
  • A connection is uniquely identified by the socket addresses of its two end points. This pair of socket addresses is known as a socket pair and is denoted by the tuple (cliaddr:cliport, servaddr:servport)

socket

11.4 The Sockets Interface

The sockets interface is a set of functions that are used in conjunction with the Unix I/O functions to build network applications

socket interface

11.4.1 Socket Address Structures

  • From the perspective of the Linux kernel, a socket is an end point for communication.

  • From the perspective of a Linux program, a socket is an open file with a corresponding descriptor

1
2
3
4
5
6
7
8
9
10
11
12
/* IP socket address structure */
struct sockaddr_in {
uint16_t sin_family; /* Protocol family (always AF_INET) */
uint16_t sin_port; /* Port number in network byte order */
struct in_addr sin_addr; /* IP address in network byte order */
unsigned char sin_zero[8]; /* Pad to sizeof(struct sockaddr) */
};
/* Generic socket address structure (for connect, bind, and accept) */
struct sockaddr {
uint16_t sa_family; /* Protocol family */
char sa_data[14]; /* Address data */
};

11.4.2 The socket Function

Clients and servers use the socket function to create a socket descriptor.

1
2
3
4
#include <sys/types.h>
#include <sys/socket.h>
int socket(int domain, int type, int protocol);
//Returns: nonnegative descriptor if OK, −1 on error
  • The clientfd descriptor returned by socket is only partially opened and cannot yet be used for reading and writing.
  • How we finish opening the socket depends on whether we are a client or a server

11.4.3 The connect Function

A client establishes a connection with a server by calling the connect function

1
2
3
4
#include <sys/socket.h>
int connect(int clientfd, const struct sockaddr *addr,
socklen_t addrlen);
//Returns: 0 if OK, −1 on error
  • The connect function attempts to establish an Internet connection with the server at socket address addr, where addrlen is sizeof(sockaddr_in)

  • The connect function blocks until either the connection is successfully established or an error occurs.

  • As with socket, the best practice is to use getaddrinfo to supply the arguments to connect

11.4.4 The bind Function

The remaining sockets functions—bind, listen, and accept—are used by servers to establish connections with clients.

1
2
3
#include <sys/socket.h>
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
//Returns: 0 if OK, −1 on error

The bind function asks the kernel to associate the server’s socket address in addr with the socket descriptor sockfd. The addrlen argument is sizeof(sockaddr_ in)

11.4.5 The listen Function

By default, the kernel assumes that a descriptor created by the socket function corresponds to an active socket that will live on the client end of a connection. A server calls the listen function to tell the kernel that the descriptor will be used by a server instead of a client.

1
2
3
#include <sys/socket.h>
int listen(int sockfd, int backlog);
//Returns: 0 if OK, −1 on error
  • The listen function converts sockfd from an active socket to a listening socket that can accept connection requests from clients.
  • The backlog argument is a hint about the number of outstanding connection requests that the kernel should queue up before it starts to refuse requests

11.4.6 The accept Function

Servers wait for connection requests from clients by calling the accept function.

1
2
3
#include <sys/socket.h>
int accept(int listenfd, struct sockaddr *addr, int *addrlen);
//Returns: nonnegative connected descriptor if OK, −1 on error

Why two different fd?

Because as a server, we want to serve multiple client simultaneously.

  • The server use the listenfd to accept connection.
  • Then use the connfd created by accept() to communicate with the server.
  • Same port, same process, but different server.

multiple fd

11.4.7 Host and Service Conversion

  • Linux provides some powerful functions, called getaddrinfo and getnameinfo, for converting back and forth between binary socket address structures and the string representations of hostnames, host addresses, service names, and port numbers.
  • When used in conjunction with the sockets interface, they allow us to write network programs that are independent of any particular version of the IP protocol.

The getaddrinfo Function

(Actually invoke DNS to do the job)

The getaddrinfo function converts string representations of hostnames, host addresses, service names, and port numbers into socket address structures

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
int getaddrinfo(
const char *host, //(both IP or DomainName is OK)
const char *service,//(both port or ServiceName is OK)
const struct addrinfo *hints,//(input args to this func)
struct addrinfo **result //output linked list
);
//Returns: 0 if OK, nonzero error code on error
void freeaddrinfo(struct addrinfo *result);
//Returns: nothing
const char *gai_strerror(int errcode);
//Returns: error message
  • Given host and service (the two components of a socket address), getaddrinfo returns a result that points to a linked list of addrinfo structures, each of which points to a socket address structure that corresponds to host and service
  • After a client calls getaddrinfo, it walks this list, trying each socket address in turn until the calls to socket and connect succeed and the connection is established(Similar for server side)
  • To avoid memory leaks, the application must eventually free the list by calling freeaddrinfo
  • If getaddrinfo returns a nonzero error code, the application can call gai_strerror to convert the code to a message string.

return linked list

  • The optional hints argument is an addrinfo structure (Figure 11.16) that provides finer control over the list of socket addresses that getaddrinfo returns.

Structure of list element

1
2
3
4
5
6
7
8
9
10
struct addrinfo {
int ai_flags; /* Hints argument flags */
int ai_family; /* First arg to socket function */
int ai_socktype; /* Second arg to socket function */
int ai_protocol; /* Third arg to socket function */
char *ai_canonname; /* Canonical hostname */
size_t ai_addrlen; /* Size of ai_addr struct */
struct sockaddr *ai_addr; /* Ptr to socket address structure */
struct addrinfo *ai_next; /* Ptr to next item in linked list */
};

The getnameinfo Function

The getnameinfo function is the inverse of getaddrinfo. It converts a socket address structure to the corresponding host and service name strings.

1
2
3
4
5
6
#include <sys/socket.h>
#include <netdb.h>
int getnameinfo(const struct sockaddr *sa, socklen_t salen,
char *host, size_t hostlen,
char *service, size_t servlen, int flags);
//Returns: 0 if OK, nonzero error code on error

11.4.9 Example Echo Client and Server

The best way to learn the sockets interface is to study example code

Echo Client

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include "csapp.h"

int main(int argc , char ** argv ){

int clientfd ;
char * host ;
char * port ;
char buf [MAXLINE];
rio_t rio ;

host = argv[1];
port = argv[2];

clientfd = Open_clientfd(host, port);

Rio_readinitb(&rio, clientfd);

while(Fgets(buf, MAXLINE, stdin) != NULL){
//fflush(stdout);
Rio_writen(clientfd, buf, strlen(buf));
Rio_readlineb(&rio, buf, MAXLINE);
Fputs(buf, stdout);
}
Close(clientfd);
exit(0);

}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include "csapp.h"

void echo(int connectfd );

int main(int argc , char ** argv ){
int listenfd ;
int connectfd ;

socklen_t clientlen ;
struct sockaddr_storage clientaddr ;

char clienthostname[MAXLINE] , clientport[MAXLINE];

listenfd = Open_listenfd(argv[1]);

while(1){
clientlen = sizeof(struct sockaddr_storage);
connectfd = Accept(listenfd, (struct sockaddr*)&clientaddr,&clientlen);
Getnameinfo((struct sockaddr *)&clientaddr, clientlen, clienthostname, MAXLINE, clientport, MAXLINE, 0);
printf("Accepted connection from (%s:%s)\n" , clienthostname , clientport);
echo(connectfd);
Close(connectfd);
}
exit(0);
}


void echo(int connectfd){
size_t n ;
char buf [MAXLINE];

rio_t rio ;

Rio_readinitb(&rio , connectfd);

while( (n = Rio_readlineb(&rio, buf, MAXLINE)) != 0){
printf("server received %zu bytes from client\n" , n);
Rio_writen(connectfd, buf, n);
}
}

Test your server:

1
$ telnet localhost 15213

Notice that telnet is quite unsecure, you should NEVER use it in real life!!!


11.5 Web Servers

11.5.1 Web Basics

11.5.2 Web Content

To Web clients and servers, content is a sequence of bytes with an associated MIME (multipurpose internet mail extensions) type.

mime

Web servers provide content to clients in two different ways:

(Deprecated! Nowadays we have client side programming language like javascript executing code in browser to interact with server)

  • Fetch a disk file and return its contents to the client.

    (The disk file is known as static content and the process of returning the file to the client is known as serving static content.)

  • Run an executable file and return its output to the client.

    (The output produced by the executable at run time is known as dynamic content, and the process of running the program and returning its output to the client is known as serving dynamic content.)

Every piece of content returned by a Web server is associated with some file that it manages. Each of these files has a unique name known as a URL

  • URLs for executable files can include program arguments after the filename. A ? character separates the filename from the arguments, and each argument is separated by an & character.

    • For example, http://bluefish.ics.cs.cmu.edu:8000/cgi-bin/adder?15000&213 specify an executable file and give it two arguments
  • Clients and servers use different parts of the URL during a transaction.

    • Client use prefix like http://www.google.com:80 to determine what kind of server to contact, where the server is, and what port it is listening on.
    • Server use the suffix like /index.html to find the file on its filesystem and to determine whether the request is for static or dynamic content.

how servers interpret the suffix of a URL

  • There are no standard rules for determining whether a URL refers to static or dynamic content.
    • Each server has its own rules for the files it manages.
    • A classic (old-fashioned) approach is to identify a set of directories, such as cgi-bin, where all executables must reside
  • The initial / in the suffix does NOT denote the Linux root directory
    • Rather, it denotes the home directory for whatever kind of content is being requested.
    • For example, a server might be configured so that all static content is stored in directory /usr/httpd/html and all dynamic content is stored in directory /usr/httpd/cgi-bin.
    • The minimal URL suffix is the / character, which all servers expand to some default home page such as /index.html.
    • This explains why it is possible to fetch the home page of a site by simply typing a domain name to the browser. The browser appends the missing / to the URL and passes it to the server, which expands the / to some default filename.

11.5.3 HTTP Transactions

telnet example

HTTP Requests

An HTTP request consists of

  • a request line
    • A request line has the form method URI version
    • HTTP supports a number of different methods, including GET, POST, OPTIONS, HEAD, PUT, DELETE, and TRACE
  • followed by zero or more request headers
    • Request headers provide additional information to the server, such as the brand name of the browser or the MIME types that the browser understands.
    • header-name: header-data
  • followed by an empty text line that terminates the list of headers

HTTP Responses

http response status message

An HTTP response consists of

  • a response line
    • A response line has the form version status-code status-message
  • followed by zero or more response headers
  • followed by an empty line that terminates the headers
  • followed by the response body

11.5.4 Serving Dynamic Content

If we stop to think for a moment how a server might provide dynamic content to a client, lots of questions arise, and there is a solutio call CGI(Common Gateway Interface)

How Does the Client Pass Program Arguments to the Server?

Arguments for GET requests are passed in the URI. As we have seen, a ‘?’ character separates the filename from the arguments, and each argument is separated by an ‘&’ character

How Does the Server Pass Arguments to the Child?

After a server receives a request such as GET /cgi-bin/adder?15000&213 HTTP/1.1,it calls fork to create a child process and calls execve to run the /cgi-bin/adder program in the context of the child. Programs like the adder program are often referred to as CGI programs because they obey the rules of the CGI standard. Before the call to execve, the child process sets the CGI environment variable QUERY_STRING to 15000&213, which the adder program can reference at run time using the Linux getenv function.

How Does the Server Pass Other Information to the Child?

CGI defines a number of other environment variables that a CGI program can expect to be set when it runs

CGI enviroment arguments

Where Does the Child Send Its Output?

A CGI program sends its dynamic content to the standard output.

Before the child process loads and runs the CGI program, it uses the Linux dup2 function to redirect standard output to the connected descriptor that is associated with the client. Thus, anything that the CGI program writes to standard output goes directly to the client.

Notice that since the parent does not know the type or size of the content that the child generates, the child is responsible for generating the Content-type and Content-length response headers, as well as the empty line that terminates the headers

11.6 Putting It Together: The Tiny Web Server

Summary