1

I'm implementing a HTTP server and wonder, if there is a defined way of when a server would determine a bad request as ended to

  1. return the corresponding 400 status, and
  2. accept the following data as new request starting a new attempt to parse it.

The only idea that comes to my mind would be a very ambiguous one: searching for the next request line-like data received and start a new parse attempt from there. However, this is, as said, a very ambiguous approach, since the data of a bad request may of course contain said 'request line-like' data without actually intending this to be a separate, new request.

The same question arises when thinking of client-side response parsing of malformed responses, so taking this case into account would be appreciated.

Reizo
  • 111
  • 3

2 Answers2

0

The header end with \r\n\r\n. You simply parse each entry that you need to read and split them into argument, strtok ? or strstr, or manually.

If you talk more about the GET line;

The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs. A server
SHOULD return 414 (Request-URI Too Long) status if a URI is longer
than the server can handle (see section 10.4.15).

  Note: Servers ought to be cautious about depending on URI lengths
  above 255 bytes, because some older client or proxy
  implementations might not properly support these lengths.

Please refer to the RFC 2616 to make your web server re-act according to the standard.

nb, Make sure you are ready to use the chunk attribute too after, if you want to support HTTP1.0+, else your server will be at the HTTP0.9 standard.

yagmoth555
  • 16,300
  • 4
  • 26
  • 48
  • Can you link a source stating that the message body is terminated with \r\n\r\n? RFC 7230 Section 3 and 3.3 seem to state different. – Reizo May 07 '18 at 13:14
  • Also, imagine the message is malformed i.e. due to these line feeds being inexistent, or being inexistent in addition to another malformation, I still could not judge when another request might start. – Reizo May 07 '18 at 13:20
  • 1
    I can't really find a quote that a message should be termianted with a null character either... However, judging by some experiments with some arbitrary servers it seems that a server usually _closes_ the connection after getting a bad request. That might actually be a very uniform and unambiguous behaviour, since, in the end, it seems like there's no defined way of how a HTTP message should be seen to have ended (or a new one have started) _if not_ described in the message (header) itself (that, again, cannot be interpreted properly due to malformation). – Reizo May 07 '18 at 14:39
  • The null termination might be properly used in chunk encoded messages, but messages using `Content-Length` won't need that null character, so it's not a universal method for finding the end. – Reizo May 07 '18 at 14:44
  • @Reizo anyway we are getting outside your question, as you need to parse the request, which is not even in a message body answer, removed my other comment as such – yagmoth555 May 07 '18 at 14:55
  • Yea that lets open the question, still. I've added an own answer that somewhat collects up the judgements I made. Maybe you could review to check if that's in any way problematic. – Reizo May 07 '18 at 15:15
0

After some considereations it came quite clear that there's no universally applicable way for determining the end of a malformed message, since the messages always contain some self-describing bits of information (e.g. the Content-Length header field) that allows the recipient to actually understand the message. If for example a response would look like this:

HTTP/1.1 200 OK
Content-Length: [ consider correct content length here ]
Content-Type: text/html
<html>
    <head>
        <title>Title</title>
    </head>
    <body>
HTTP OK status messages look like this:
HTTP/1.1 200 OK
    </body>
</html>

The client parser would most likely fail at the first < since it'd expect another header field name (due to the single line break after Content-Type-header) that doesn't allow <. Further, it then should (probably) not 'search' for another valid HTTP response in the following data, since it might receive message bodies like the given, where it says HTTP/1.1 200 OK, which is not intended to be a new response, however.

Thus the best reaction to a malformed http message appears to be closing the connection, since any other attempt to interpret the following data received is inevitably ambiguous.

This however is AFAIK not in any way specified in RFC. Maybe because RFC is more about defining standards and less about handling non-standard behaviour.

Reizo
  • 111
  • 3