Hypertext Transfer Protocol

From Hill2dot0
Jump to: navigation, search
HTTP Communications Model

The Hypertext Transfer Protocol (HTTP) is the communications protocol used between a Web client (e.g., a browser, a spider) and a Web server. HTTP is designed to run over any underlying reliable transport. On the Internet, that means HTTP is implemented over the Transmission Control Protocol (TCP) and the server most commonly uses well-known port 80. HTTP identifies resources using URLs of the form http://<host>/<file-name>. HTTP version 1.1 (the version most widely used on the Internet) is described in RFC 2616.

Although the protocol name suggests that HTTP is designed to convey HTML-formatted documents. In fact, HTTP can be used to convey any resource (i.e.,file) from a server to a client. That includes images, video, and even files containing programs.

In HTTP, the client (also known as the user agent) typically initiates the request to the web server (also known as the origin server). HTTP implements an ASCII based communications protocol. The user agent request identifies the resource required and provides additional information to the origin server. The origin server responds with the requested resource and/or control information.

HTTP Messages

When a user freelance writing opportunities agent wants a particular resource from a specific origin server, it initiates an HTTP session with the specified server. This happens, for example, when you click on a link on a web page. Embedded in the link is the identity of a resource (e.g., web page, file, etc.) that you want. The browser initiates a three-step process to get that resource loaded:

  1. Use DNS to resolve the host name to an IP address. Usually, your browser reports, "looking up host name."
  2. Use TCP to establish a connection to that IP address. Usually, your browser reports, "connecting to host name."
  3. Use HTTP to retrieve the resource.
HTTP messages

This third step is really all about HTTP and takes the form of an HTTP request and response dialogue over the established TCP connection.

HTTP Request Message

The HTTP request is an ASCII text message that includes:

  • The request line: This specifies the action to be taken, the resource in question, and the version of HTTP in use. The most common request is GET, to retrieve a particular server resource. Other options include HEAD (same as GET but does not actually download the resource), POST (upload data for processing), PUT (uploads a resource), DELETE (deletes the specified resource), and TRACE (echoes back the request for diagnostic purposes). The resource in question is a file name and (possibly) a path to the file (if the file is in a subdirectory). The version of HTTP being used is identified using the syntax HTTP/<version> (e.g., “HTTP/1.1” for HTTP version 1.1).
  • Header(s): These MIME-like entries provide additional information to the origin server. All of them, except the Host header, are optional. Some example headers include:
    • Host: Identifies the host name of the origin server from which the request is being made. This header must be present in any HTTP 1.1 request, but the host name field may be blank.
    • Referrer: Provides the URL of the document containing the link to the requested file.
    • User agent: Identifies the type of browser being used (e.g., “Firefox/3.0.1” for Version 3.0.1 of the Firefox browser)
    • If-modified-since: This header includes a date and time field, and tells the server to download the requested file only if it has been modified since the supplied date and time. Most browsers maintain a cache on the local system and every downloaded Web document is saved for some period of time. If the file has not been modified since it was last saved, the server will so indicate and the client will load the page from local storage.
  • A blank line: Each line ends with the carriage return and line feed ASCII characters. This line has only these two characters.
  • Message body: For most user agent requests, this field is blank. There are, however, user agent requests in which information is being sent to the origin server (e.g., PUT, POST, etc.). The information is carried in the message body.

Here is a sample Request message from a user agent to a web server:

If-Modified-Since: Fri, 31 Dec 1999 23:59:59 GMT
Referrer: http://www.hill2dot0.com/wiki/index.php?title=Hypertext_Transfer_Protocol
Connection: Keep-Alive
User-Agent: Firefox/3.0.1
Host: www.hill.com
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*<CRLF>

HTTP Response Message

The origin server will then send an HTTP Response to the user agent. The response has the general syntax:

  • The status line: This specifies the general server response to the request. It begins with the HTTP version being used, followed by a specific status code, followed by an explanatory phrase. The code 200 indicates a successful request to which the server can respond. A code of 404 indicates the requested resource cannot be found. A complete list of status codes and the explanatory phrase can be found in the RFC.
  • Header(s): As with the request, these are MIME-like fields that provide additional information to the user agent. Some example response headers include:
    • Last-Modified: Indicates the time and date the origin server believes the requested resource was last updated. If the request included the if-modified-since header, and this date is later than the date specified in the request, the resource will follow. If the date in the request is later, the resource is not downloaded to the user agent.
    • Location: This is a form of redirect, informing the user agent of an alternate location where the requested resource can be found.
    • Proxy Authenticate: If the status code 407 (Proxy Authentication Required) is sent, this response header must be included. It provides the user agent with information about how the authentication process is to be conducted.
    • Retry-After: If the status code indicates a lack of available service (e.g., status 503, Service Unavailable) this response header can be used to indicate how long the user agent should wait before retrying. It can be specified as a date time, or as a number of seconds (e.g., 120 = wait two minutes).
  • A blank line: Each line ends with the carriage return and line feed ASCII characters. This line has only these two characters.
  • Message body: When the origin server is sending a resource, this is the field it is placed in. If the resource is larger than can be fit into a single TCP segment, TCP will create multiple segments, IP will carry them in separate packets, and TCP will ensure arrival of all elements and correct re-ordering at the user agent.

Here is a sample response message from a server to a user agent:

HTTP/1.0 200 OK <CRLF>
Server: NCSA/1.4.2
Content-type: text/html
Last-modified: Mon, 12 Feb 1996 23:29:45 GMT.
Content-length: 3260<CRLF>
<requested resource><CRLF>


<mp3>http://podcast.hill-vt.com/podsnacks/2008q3/http.mp3%7Cdownload</mp3> | Hypertext Transfer Protocol (HTTP)