How to extract data from HTTP header in C?

2019-03-04 10:21发布

Today I am asking how to extract the data section from the received buffer in my recv() in C (not C++).

I just need some suggestions, how would I get

HTTP/1.1 200 OK\r\n
Date: Mon, 23 May 2005 22:38:34 GMT\r\n
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)\r\n
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT\r\n
ETag: "3f80f-1b6-3e1cb03b"\r\n
Content-Type: text/html; charset=UTF-8\r\n
Content-Length: 131\r\n
Connection: close\r\n
\r\n

<html>
<head>
<title>An Example Page</title>
</head>
<body>
  Hello World, this is a very simple HTML document.
</body>
</html>

The part of the above header? It is stored in my buffer, I specifically just want to dissect the data (the source code of the page). Any ideas?

2条回答
一纸荒年 Trace。
2楼-- · 2019-03-04 10:51

You need to actually parse the data in order to know where the headers end and the message data begins, and where the message data ends. The headers end with a \r\n\r\n (CRLF+CRLF, 0x0D 0x0A 0x0D 0x0A) byte sequence. So you have to keep reading until you encounter that terminator. Then you have to parse the headers to know how the rest of the message is encoded and how it is terminated. Refer to RFC 2616 Section 4.4 Message Length for the rules. That will tell you HOW to read the remaining data and WHEN to stop reading it. The data might be chunked or compressed or self-terminating. The Content-Type and Transfer-Encoding headers tell you how to interpret the message data.

In your particular example, after reading the headers, per Section 4.4 you would retrieve the value of the Content-Length header and then read exactly 131 bytes, stop reading, and close the socket because of the Connection: close header. You would then retreive the value of the Content-Type header and know that the data is UTF-8 encoded HTML and process it accordingly.

See the pseudo-code I posted in an earlier answer:

Receiving Chunked HTTP Data With Winsock

查看更多
我想做一个坏孩纸
3楼-- · 2019-03-04 10:58

The header ends with \r\n\r\n. If the whole response is in the receive buffer and you put a '\0' at the end of the response, then you can use the following code to find the start of the data section

char *data = strstr( buffer, "\r\n\r\n" );
if ( data != NULL )
{ 
    data += 4;
    // do something with the data
}
查看更多
登录 后发表回答