Hyper Text Transfer Protocol (HTTP)
The Hyper Text Transport Protocol is a text-based request-response client-server protocol. A HTTP client (e.g. a web browser such as Mozilla) performs a HTTP request to a HTTP server (e.g. the Apache HTTP server), which in return will issue a HTTP response. The HTTP protocol header is text-based, where headers are written in text lines.
HTTP/1.1 allows for client-server connections to be pipelined, whereby multiple requests can be sent (often in the same packet), without waiting for a response from the server. The only restriction is the server MUST return the responses in the same order as they were received. This enables greater efficiency, especially on revalidation.
An encrypted variant named HTTPS is also available. This is often used where privacy of data is necessary, e.g. when using online banking. The HTTPS protocol is in fact two protocols running on top of each other. The first protocol is a security protocol like SSL, TLS or PCT. The second protocol, which runs on top of this security protocol, is HTTP. The URLs starting with https:// really are only a shorthand notation for the end user. The web browser will read the URI scheme (https://), initiate the security protocol to the server, and once this secure connection is established, issue a HTTP request over it with the URI specified in the request.
The Hyper Text Transfer Protocol (HTTP) was initiated at the CERN in Geneve (Switzerland), where it emerged (together with the HTML presentation language) from the need to exchange scientific information on a computer network in a simple manner. The first public HTTP implementation only allowed for plain text information, and almost instantaneously became a replacement of the GOPHER service. One of the first text-based browsers was LYNX which still exists today; a graphical HTTP client appeared very quickly with the name NCSA Mosaic. Mosaic was a popular browser back in 1994. Soon the need for a more rich multimedia experience was born, and the markup language provided support for a growing multitude of media types.
Support for multiple media types was already part of the informal HTTP/1.0 standard published as RFC1945 back in 1996. As the community using HTTP grew at an incredibly fast pace, and thanks to usage experience gathered by the community and processed by experts, the need for a more formal definition of the HTTP protocol emerged. Hence HTTP/1.1 was published, first as RFC2068 in January 1997, soon superseded by RFC2616 published in June 1999.
TCP: Typically, HTTP uses TCP as its transport protocol. The well known TCP port for HTTP traffic is 80. A HTTP proxy often uses a different port; typical values are 81, 3128, 8000 and 8080. However, HTTP can use other transport protocols as well.
Request by an end-user's browser
This user wants to access the web site "www.freebsd.org", so they type in http://www.freebsd.org into their browser and hit enter. After the usual DNS resolution to find the IP address for www.freebsd.org, a connection is initiated via TCP to the web server (SYN; SYN,ACK; ACK). The very next thing to be sent to the web server by the browser/client is the following plain text request:
GET / HTTP/1.1 Host: www.freebsd.org User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.7) Gecko/20050414 Firefox/1.0.3 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive If-Modified-Since: Mon, 09 May 2005 21:01:30 GMT If-None-Match: "26f731-8287-427fcfaa"
The server knows the browser/client is done with its traffic when it receives a blank line with a carriage return + line feed (\r\n).
Response from the server
The response is also in plain text:
HTTP/1.1 200 OK Date: Fri, 13 May 2005 05:51:12 GMT Server: Apache/1.3.x LaHonda (Unix) Last-Modified: Fri, 13 May 2005 05:25:02 GMT ETag: "26f725-8286-42843a2e" Accept-Ranges: bytes Content-Length: 33414 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html
The browser/client now knows that text/html is coming, and here it is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <!-- Rest of the HTML Page Here --> </html>
The browser/client knows the server is done sending its html (or data for non-html) when it receives a blank line with a carriage return + line feed (\r\n).
Wireshark's HTTP dissector is fully functional (XXX - is that really true?). (XXX - add some words about MIME body data encoding/enchunking here). In addition, you can get basic statistics about HTTP requests/responses using Wireshark's menu item: Statistics/HTTP.
There are some HTTP_Preferences.
Example capture file
SampleCaptures/http.cap A simple HTTP request and response.
SampleCaptures/http_gzip.cap. A simple HTTP request and a one packet gzip Content-Encoded response. Try this capture if you are having problems decompressing Content-Encoded packets, as this works with the default preferences.
A complete list of HTTP display filter fields can be found in the display filter reference
Show only the http based traffic:
Show only the famous "404: page not found" responses:
http.response.code == 404
Show only file data received over HTTP (the content of the responses):
You cannot directly filter HTTP protocols while capturing. However, if you know the TCP port used (see above), you can filter on that one.
Capture HTTP traffic over the default port (80):
tcp port 80
Capture HTTP traffic over the default SSL port (443):
tcp port 443
RFC1945 Hypertext Transfer Protocol -- HTTP/1.0
RFC2068 Hypertext Transfer Protocol -- HTTP/1.1 (obsoleted by RFC2616)
RFC2616 Hypertext Transfer Protocol -- HTTP/1.1