本文记录了一个基于c socket
的简易代理服务器的实现。(CS:APP lab 10 proxy lab
)git
本代理服务器支持keep-alive
链接,将访问记录保存在log
文件。github
Github: https://github.com/He11oLiu/proxy浏览器
全文分为如下部分服务器
CS:APP
对服务器的要求HTTP/1.0
)HTTP
协议,修改处理函数使其支持keep-alive
readn
与writen
的优化inet
ntoa
, gethostbyname
, and gethostbyaddr
inside a thread. In particular, the open clientfd
function in csapp.c is thread-unsafe because it calls gethostbyaddr
, a Class-3 thread unsafe function (CSAPP 13.7.1).You will need to write a thread-safe version of open clientfd
, called open_clientfd_ts
, that uses the lock-and-copy technique (CS:APP 13.7.1) when it calls gethostbyaddr
.Rio_readn
, Rio_readlineb
, and Rio writen error checking wrappers in csapp.c
arenot appropriate for a realistic proxy because they terminate the process when they encounter an error. Instead, you should write new wrappers called Rio readn w
, Rio readlineb w
, and Rio writen w that simply return after printing a warning message when I/O fails. When either of the read wrappers detects an error, it should return 0, as though it encountered EOF on the socket.errno =ECONNRESET
error caused by reading from a connection that has already been closed by the peeron the other end, typically an overloaded end server. The most common write failure is an errno =EPIPE
error caused by writing to a connection that has been closed by its peer on the other end. This can occur for example, when a user hits their browser’s Stop button during a long transfer.SIGPIPE
signal whose default action isto terminate the process. To keep your proxy from crashing you can use the SIGIGN argument to th esignal function (CS:APP 8.5.3) to explicitly ignore these SIGPIPE signalsImplementing a Sequential Web Proxymarkdown
proxy lab
雏形服务器框架多线程
int main(int argc, char **argv){
int lisenfd, port;
unsigned int clientlen;
clientinfo* client;
/* Ignore SIGPIPE */
Signal(SIGPIPE, SIG_IGN);
if (argc != 2){
fprintf(stderr, "usage:%s <port>\n", argv[0]);
exit(1);
}
port = atoi(argv[1]);
/* open log file */
logfile = fopen("proxylog","w");
lisenfd = Open_listenfd(port);
clientlen = sizeof(struct sockaddr_in);
while (1){
/* Create a new memory area to pass arguments to doit */
/* It will be free by doit */
client = (clientinfo*)Malloc(sizeof(clientinfo));
client->socketfd = Accept(lisenfd, (SA *)&client->clientaddr, &clientlen);
printf("Client %s connected\n",inet_ntoa(client->clientaddr.sin_addr));
doit(client);
}
return 0;
}
做为最第一版本,先完成一个迭代服务器,而非并行服务器,这类服务器的框架相对简单,这个部分主要测试对于期功能的理解,并在只针对一个用户接入的状况下进行处理。app
服务器框架可简化为以下,其中doit()
为实际处理客户端请求的函数。框架
init_server();
while(1){
accept();
doit();
}
doit()
处理客户端的请求对于代理的处理条例很清晰socket
HTTP
请求uri
HTTP
请求/* * doit */
void doit(clientinfo *client){
int serverfd;
char buf[MAXLINE],method[MAXLINE],uri[MAXLINE],version[MAXLINE];
char hostname[MAXLINE],pathname[MAXLINE];
int port;
char errorstr[MAXLINE];
char logstring[MAXLINE];
rio_t rio;
ssize_t len = 0;
int resplen = 0;
/* init args */
Rio_readinitb(&rio,client->socketfd);
Rio_readlineb(&rio,buf,MAXLINE);
sscanf(buf,"%s %s %s",method,uri,version);
if(strcmp(method,"GET")){
fprintf(stderr, "error request\n");
sprintf(errorstr,"%s Not Implement",method);
clienterror(client->socketfd, method, "501","Not Implement", errorstr);
Close(client->socketfd);
return;
}
if(parse_uri(uri,hostname,pathname,&port)!=0){
fprintf(stderr, "parse error\n");
clienterror(client->socketfd, method, "400","uri error","URI error");
Close(client->socketfd);
return;
}
#ifdef DEBUG
printf("Finish parse %s %s %s %d\n",uri,hostname,pathname,port);
#endif
/* connect to server */
if((serverfd=open_clientfd(hostname,port))<0){
printf("Cannot connect to server %s %d\n",hostname,port);
clienterror(client->socketfd, method, "302","Server not found", "Server not found");
Close(client->socketfd);
return;
}
/* generate and push the request to server */
if(pathname[0]=='\0') strcpy(pathname,"/");
if(strcmp("HTTP/1.0",version)!=0) printf("Only support HTTP/1.0");
sprintf(buf,"%s %s HTTP/1.0\r\n",method, pathname);
Rio_writen(serverfd,buf,strlen(buf));
sprintf(buf,"Host: %s\r\n",hostname);
Rio_writen(serverfd,buf,strlen(buf));
sprintf(buf,"\r\n");
Rio_writen(serverfd,buf,strlen(buf));
/* receive the response from server */
Rio_readinitb(&rio, serverfd);
while((len = rio_readnb(&rio, buf, MAXLINE)>0)){
Rio_writen(client->socketfd, buf, MAXLINE);
resplen += MAXLINE - len;
memset(buf, 0, MAXLINE);
}
format_log_entry(logstring, &client->clientaddr, uri, resplen);
fprintf(logfile, "%s\n", logstring);
close(client->socketfd);
close(serverfd);
/* free the clientinfo space */
free(client);
}
在这里遇到Q&A
中的第二个问题,没法支持HTTP/1.1
ide
尝试直接在设置中接入此proxy
而网页通常发出为HTTP/1.1
,致使也存在卡在read
的状况,须要特殊处理
可是因为浏览器发出的变量中有要求keep-alive
的,致使read
不能用,仍是放弃此种方法。
/* Or just copy the HTTP request from client */
Rio_writen_w(serverfd, buf, strlen(buf));
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
Rio_writen_w(serverfd, buf,len);
if (!strcmp(buf, "\r\n")) /* End of request */
break;
memset(buf,0,MAXLINE);
}
Parse_uri
的小BUGhostend = strpbrk(hostbegin, " :/\r\n\0");
/* when no ':' show up in the end,hostend may be NULL */
if(hostend == NULL) hostend = hostbegin + strlen(hostbegin);
设置http
代理
尝试链接
Dealing with multiple requests concurrently
支持多线程是很是简单的,可是稍微复杂一点的是后面的互斥量处理。
这里先新写一个线程处理函数。
void *thread_handler(void *arg){
doit((clientinfo*)arg);
return NULL;
}
而后在原来的doit
的地方改成
Pthread_create(&thread, NULL, thread_handler, client); Pthread_detach(thread);
如今服务器的框架以下:
main(){
init_server();
while(1){
accept();
create_newThread(handler,arg);
}
}
//每一个线程的处理
handler(arg){
initThread();
doit(arg);
}
因为在macOS
中的sem_init
已经被标记为__deprecated
,内存中的互斥量已经不能用了。这里改成基于文件的sem_open
来替代sem_init
。
/* Mutex semaphores */
sem_t *mutex_host, *mutex_file;
if((mutex_host = sem_open("mutex_host",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}
if((mutex_file = sem_open("mutex_file",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}
在文档中提到过open_client
中因为调用了getaddrbyhost
,必需要在调用以前获取互斥量,故完成新的open_clientfd
。
在CSAPP
中打包好了PV
原语的接口,能够直接调用。
原来的open_clientfd
的实现方法以下,只用在注视掉的地方加上PV
原语保证只有一个thread
在cs
区域便可。
/* Fill in the server's IP address and port */
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
//P(mutex_host);
if ((hp = gethostbyname(hostname)) == NULL)
return -2; /* check h_errno for cause of error */
bcopy((char *)hp->h_addr_list[0],
(char *)&serveraddr.sin_addr.s_addr, hp->h_length);
serveraddr.sin_port = htons(port);
//V(mutex_host);
对于文件,进行相似操做
format_log_entry(logstring, &client->clientaddr, uri, resplen);
P(mutex_file);
fprintf(logfile, "%s\n", logstring);
V(mutex_file);
为了可以在服务器运行的时候打开文件,将文件操做修改成以下:
format_log_entry(logstring, &client->clientaddr, uri, resplen);
P(mutex_file);
logfile = fopen("proxy.log","a");
fprintf(logfile, "%s\n", logstring);
fclose(logfile);
V(mutex_file);
利用一个全局变量来记录当前thread
的id
。并经过clientinfo
将其传走。
/* thread id */
unsigned long tid = 0;
printf("Client %s connected tid = %zd\n",inet_ntoa(client->clientaddr.sin_addr),tid);
client->tid = tid ++;
Pthread_create(&thread, NULL, thread_handler, client);
Rio_xxx_w
因为Rio_writen
与Rio_readnb
遇到错误时会直接unix_error
。为了保证服务器继续运行,须要将其改成打印错误并返回。
void Rio_writen_w(int fd, void *usrbuf, size_t n){
if (rio_writen(fd, usrbuf, n) != n)
printf("Rio_writen_w error\n");
}
ssize_t Rio_readnb_w(rio_t *rp, void *usrbuf, size_t n){
ssize_t rc;
if ((rc = rio_readnb(rp, usrbuf, n)) < 0) {
printf("Rio_readnb_w error\n");
rc = 0;
}
return rc;
}
ssize_t Rio_readlineb_w(rio_t *rp, void *usrbuf, size_t maxlen){
ssize_t rc;
if ((rc = rio_readlineb(rp, usrbuf, maxlen)) < 0) {
printf("Rio_readlineb_w failed\n");
return 0;
}
return rc;
}
HTTP/1.1
以及图片加载问题在解决以前,在Github
上转了一圈,所看有限几个repo
中有的绕过了这个部分,直接像上面同样直接解析发送HTTP/1.0
的请求,有的直接无差异用readline
致使图片等文件仍然陷入read
致使必须等待对方服务器断开链接后才能读到完整数据从read
中出来,而致使网页加载速度奇慢。
下面就从HTTP
的协议入手,寻找一个妥善的方法解决该问题。
当客户端请求时是
Connection: keep-alive
的时候,服务器返回的形式Transfer-Encoding: chunked
的形式,以确保页面数据是否结束,长链接就是这种方式,用chunked
形式就不能用content-length
content-length
设置响应消息的实体内容的大小,单位为字节。对于HTTP协议来讲,这个方法就是设置Content-Length
响应头字段的值。- 由于当浏览器与WEB服务器之间使用持久(
keep-alive
)的HTTP链接,若是WEB服务器没有采用chunked
传输编码方式,那么它必须在每个应答中发送一个Content-Length
的响应头来表示各个实体内容的长度,以便客户端可以分辨出上一个响应内容的结束位置。- 当不是
keep-alive
,就是经常使用短链接形式,会直接把链接关掉,不须要长度。- 服务器上取得是动态内容,全部没有
content-length
这项- 若是是静态页面,则有
content-length
故,对于服务器传回来的信息,不能直接无脑读,要对头部进行解析。对于服务器传回来的信息进行处理的步骤以下:
\n\r
表明着头的结束。Content-Length:
条目表明着时明确给出长度的case
,须要记录下长度的大小Transfer-Encoding:Chunked
条目表明着属于Chunked
编码的case
,在后面用readline
进行处理。body
Chunked
编码,则直接使用readline
进行读取。若读到0/r/n
时,表明当前的body
已经结束。退出循环。content-length
属性,则利用read_size = MAXLINE > content_length ?content_length : MAXLINE
计算每次须要读取的byte
,而后调用readnb
来精确读取字节。当读取到指定字节表明着body
结束,退出循环。这样能够解决keep-alive
致使的问题。
/* Receive response from target server and forward to client */
Rio_readinitb(&rio, serverfd);
/* Read head */
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
/* Fix bug of return value when response line exceeds MAXLINE */
if (len == MAXLINE && buf[MAXLINE - 2] != '\n') --len;
/* when found "\r\n" means head ends */
if (!strcmp(buf, "\r\n")){
Rio_writen_w(client->socketfd, buf, len);
break;
}
if (!strncasecmp(buf, "Content-Length:", 15)) {
sscanf(buf + 15, "%u", &content_length);
chunked = False;
}
if (!strncasecmp(buf, "Transfer-Encoding:", sizeof("Transfer-Encoding:"))) {
if(strstr(buf,"chunked")!=NULL || strstr(buf,"Chunked")!=NULL)
chunked = True;
}
/* Send the response line to client and count the total len */
Rio_writen_w(client->socketfd, buf, len);
recv_len += len;
}
/* Read body */
if(chunked){
/* Transfer-Encoding:chuncked */
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
/* Fix bug of return value when response line exceeds MAXLINE */
if (len == MAXLINE && buf[MAXLINE - 2] != '\n') --len;
/* Send the response line to client and count the total len */
Rio_writen_w(client->socketfd, buf, len);
recv_len += len;
/* End of response */
if (!strcmp(buf, "0\r\n")) {
Rio_writen_w(client->socketfd, "0\r\n", 2);
recv_len += 2;
break;
}
}
}
else{
read_size = MAXLINE > content_length?content_length:MAXLINE;
while((len = Rio_readnb_w(&rio,buf,read_size))!=0){
content_length -= len;
recv_len += len;
Rio_writen_w(client->socketfd, buf, len);
if(content_length == 0) break;
read_size = MAXLINE > content_length?content_length:MAXLINE;
}
}
固然这不是真正意义上的keep-alive
。要作到持续连接少TCP
创建几回,须要利用循环,再回到上面从客户端获取信息。
再次回到writen
与readn
的函数上。但用户还没加载完内容,就开始点击进入下一个网页,致使关闭了当前的网页,就会致使writen
出现错误。
Reads and writes can fail for a variety of reasons. The most common read failure is an
errno =ECONNRESET
error caused by reading from a connection that has already been closed by the peeron the other end, typically an overloaded end server.The most common write failure is an
errno = EPIPE
error caused by writing to a connection that has been closed by its peer on the other end. This can occur for example, when a user hits their browser’s Stop button during a long transfer.
首先将这种错误状况单独处理
int Rio_writen_w(int fd, void *usrbuf, size_t n){
if (rio_writen(fd, usrbuf, n) != n){
printf("Rio_writen_w error\n");
if(errno == EPIPE)
/* client have closed this connection */
return CLIENT_CLOSED;
return UNKOWN_ERROR;
}
return NO_ERROR;
}
而后将全部的writen_w
替换为
if(Rio_writen_w(client->socketfd, buf, len)==CLIENT_CLOSED){
clienton = False;
break;
}
当clienton
为false
的状况就能够直接跳过剩余,直接到log
一样的,修改read
为
ssize_t Rio_readnb_w(rio_t *rp, void *usrbuf, size_t n,bool *serverstat){
ssize_t rc;
if ((rc = rio_readnb(rp, usrbuf, n)) < 0) {
printf("Rio_readnb_w error\n");
rc = 0;
if(errno == ECONNRESET) *serverstat = False;
}
return rc;
}
ssize_t Rio_readlineb_w(rio_t *rp, void *usrbuf, size_t maxlen,bool *serverstat){
ssize_t rc;
if ((rc = rio_readlineb(rp, usrbuf, maxlen)) < 0) {
printf("Rio_readlineb_w failed\n");
rc = 0;
if(errno == ECONNRESET) *serverstat = False;
}
return rc;
}
修改从客户端读取的readline
为
Rio_readlineb_w(&rio, buf, MAXLINE,&clienton)
修改从服务器读取的readline
为
Rio_readlineb_w(&rio, buf, MAXLINE,&serveron)
并添加一些对于server
与client
状态的检查避免消耗资源。
为什么都适用fd
来描述套接字
从
unix
程序的角度来看,socket
是一个有相应描述符的打开文件。
为什么在HTTP/1.1
的状况下,须要中断等好久才可以读出来
Client 127.0.0.1 connected
error request
Client 127.0.0.1 connected
Finish parse http://www.baidu.com www.baidu.com 80
Interrupted and Rebegin
Interrupted and Rebegin
Interrupted and Rebegin
Interrupted and Rebegin
while (nleft > 0) {
//在这一步出不来????
if ((nread = read(fd, bufp, nleft)) < 0) {
if (errno == EINTR) /* interrupted by sig handler return */
nread = 0; /* and call read() again */
else
return -1; /* errno set by read() */
}
else if (nread == 0)
break; /* EOF */
nleft -= nread;
bufp += nread;
}
观察是在HTTP/1.1
的状况下,在read
函数出不来。
猜想多是1.1
是持续连接,不存在EOF
,须要手动判断是否该退出while
已解决,见Part3
非内存的mutex
打开时会读到上次的值
先利用unlink
来取消连接。
sem_unlink("mutex_host");
sem_unlink("mutex_file");
if((mutex_host = sem_open("mutex_host",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}
if((mutex_file = sem_open("mutex_file",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}